Issue
Given a DataFrame df
:
id items
0 1 CRB Spot Index: Grease: First 2 Months: YoY
1 2 CRB Spot Index: Textiles
2 3 CRB Spot Index: Livestock
3 4 Korea: Export Value: First 10 Days: YoY
4 5 Korea: Export Value: First 20 Days: YoY
5 6 Korea: Import Value: First 20 Days: YoY
6 7 Korea: Import Value: First 10 Days: YoY
7 8 Vietnam: Import Value: First 2 Weeks: MoM
8 9 Vietnam: Import Value: First 3 Weeks: MoM
I want to filter items which match patterns: First + integer + Days
or First + integer + Weeks
, then create a new column to indicate them:
id items indicator
0 1 CRB Spot Index: Grease: First 2 Months: YoY NaN
1 2 CRB Spot Index: Textiles NaN
2 3 CRB Spot Index: Livestock NaN
3 4 Korea: Export Value: First 10 Days: YoY Y
4 5 Korea: Export Value: First 20 Days: YoY Y
5 6 Korea: Import Value: First 20 Days: YoY Y
6 7 Korea: Import Value: First 10 Days: YoY Y
7 8 Vietnam: Import Value: First 2 Weeks: MoM Y
8 9 Vietnam: Import Value: First 3 Weeks: MoM Y
How could I do that using Pandas?
I use df.loc[(df['items'].str.contains('First|Days', na=False)) & (df['items'].str.contains('First|Weeks', na=False)), 'indicators']='Y'
, it doesn't generate an expected result as shown above, also maybe there are other more concise answers.
Solution
Check regex for match First
, then \s+
for space(s), then \d+
for integers, \s+
for again space(s) and last Days|Weeks
for Days
or Weeks
:
df.loc[df['items'].str.contains('First\s+\d+\s+Days|Weeks', na=False), 'indicators']='Y'
print (df)
id items indicators
0 1 CRB Spot Index: Grease: First 2 Months: YoY NaN
1 2 CRB Spot Index: Textiles NaN
2 3 CRB Spot Index: Livestock NaN
3 4 Korea: Export Value: First 10 Days: YoY Y
4 5 Korea: Export Value: First 20 Days: YoY Y
5 6 Korea: Import Value: First 20 Days: YoY Y
6 7 Korea: Import Value: First 10 Days: YoY Y
7 8 Vietnam: Import Value: First 2 Weeks: MoM Y
8 9 Vietnam: Import Value: First 3 Weeks: MoM Y
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.