Issue
I have a pandas Dataframe over the different states in America. I would like to group by the two columns year and state in order to statistically test some things e.g. cause of death, newborns etc. and also plot it.
I can only come up with the groupby
pandas function where I have to specify a statistical summary in the end such as:
import pandas as pd
df = pd.read_csv(path + 'csvfile.csv')
grouped_df = df.groupby(['Year', 'State']).mean()
However, I would like to just group by the year and state alone, but doing so with groupby
I get this:
import pandas as pd
df = pd.read_csv(path + 'csvfile.csv')
grouped_df = df.groupby(['Year', 'State'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000025720134688>
How can I do this?
Solution
First groupby
is simplifying like iterator
, so is important what is after specify - aggregate function, custom function..?
Not sure what means group by the year and state alone
, if need MultiIndex
by 2 columns use:
grouped_df = df.set_index(['Year', 'State'])
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.