Issue
I am new to Pandas. I have data like this:
Category Sales Paid
Table 1 table Yes
Chair 3chairs Yes
Cushion 8 cushions Yes
Table 3Tables Yes
Chair 12 Chairs No
Mats 12Mats Yes
I have learnt how to apply a groupby on the category
column
However, now I have multiple rows per group. I want to sum the number of sales per group held in the Sales
column. But the sales number to add is written next to words e.g. 3Tables and there isn't a consistency on the way it's written as you can see above. How can I split the words and then capture the value and sum per group and print out?
My reading so far has signalled that I need to use an apply method, with lambdas.
Solution
You can use str.extract
to get digits:
# Count all items
>>> (df['Sales'].str.extract('^(\d+)', expand=False).astype(int)
.groupby(df['Category']).sum())
Category
Chair 15
Cushion 8
Mats 12
Table 4
Name: Sales, dtype: int64
# Count only paid items
>>> (df['Sales'].where(df['Paid'] == 'Yes', other='0')
.str.extract('^(\d+)', expand=False).astype(int)
.groupby(df['Category']).sum())
Category
Chair 3
Cushion 8
Mats 12
Table 4
Name: Sales, dtype: int64
Intermediate results:
>>> df['Sales'].str.extract('^(\d+)', expand=False).astype(int)
0 1
1 3
2 8
3 3
4 12
5 12
Name: Sales, dtype: int64
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.