Issue
Suppose we have 1GB dataset(say .csv) to analyse and we are unable to run quickly as delay is too much to run again and again, what to do in order to make data scalable enough to analyse.
Solution
Many times I faced this problem and got a simple solution by making Data Frames of the dataset and creating new dataset (say .csv) by making an output out of Data Frames and what's most significant is creation of new Data sets almost 1/8 th of the real size of datasets. Below is an example of how it can work.
import pandas as pd
df=pd.DataFrame()
df=pd.read_csv('a1.csv')
Now after minor operations on data(if required), you can output data and get significantly small .csv file to analyse data.
df.to_csv('a2.csv')
Please correct me in case you have some other method to work on larger datasets using Pandas.
Answered By - Hari_pb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.