Issue
This is my dataframe:
import pandas as pd
df = pd.DataFrame({'a': [20, 21, 100, 4, 100, 20], 'b': [20, 20, 20, 20, 20, 20]})
I want to create column c
by using a mask. This is my desired output:
a b c
0 20 20 NaN
1 21 20 NaN
2 100 20 NaN
3 4 20 NaN
4 100 20 x
5 20 20 NaN
My mask is:
mask = (df.a > df.b)
Note that I want to start looking for this mask from the third row. That is, rows 0, 1 and 2 do not count. That is why the first row that meets the a
> b
is the 5th row which its index is 4.
This is what I have tried. But I don't know how to start from the third row.
df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'x'
These are some additional examples. First three rows do not count.
Solution
A robust solution that works independently of the index and if there is no match in the mask:
# original mask
mask = (df.a > df.b)
# set first n values to False
mask &= np.arange(len(df)) > 2
# only keep first True (if any)
mask &= ~mask.duplicated()
df.loc[mask, 'c'] = 'x'
Output:
a b c
0 20 20 NaN
1 21 20 NaN
2 100 20 NaN
3 4 20 NaN
4 100 20 x
5 20 20 NaN
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.