Issue
I would like to Groupby two columns and take average categorical count in Python
(We see that NY holds 3 apples, WA holds 4 cherries and TX holds 3 bbq for the date of 2/1/22 - now we take the average for those 3 containers.)
Data
CONTAINER CLOUDS DATE
NY apple 2/1/2022
NY apple 2/1/2022
NY apple 2/1/2022
WA cherries 2/1/2022
WA cherries 2/1/2022
WA cherries 2/1/2022
WA cherries 2/1/2022
TX bbq 2/1/2022
TX bbq 2/1/2022
TX bbq 2/1/2022
TX bbq 3/1/2022
TX bbq 3/1/2022
Desired
AVG CLOUDS PER CONTAINER: ~3.333
Doing
df = df.groupby('CONTAINER', 'DATE') \
.agg({'CLOUDS':'size', 'CLOUDS':'mean'}) \
.reset_index()
Any suggestion is appreciated
Solution
I think you can perform your first groupby on 'CONTAINER','DATE'
and get the size
of each category, then perform another groupby on the 'DATE'
alone and calculate the mean:
df.groupby(['CONTAINER','DATE']).agg({'CLOUDS':'size'}).reset_index().groupby('DATE').agg({'CLOUDS':'mean'}).reset_index()
Result:
DATE CLOUDS
0 2/1/2022 3.333333
1 3/1/2022 2.000000
Answered By - Derek O
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.