Issue
My input is this dataframe :
df = pd.DataFrame({'col1': ['abc-01', 'fft-a', 'abc-02'],
'col2': ['xyz-1', 'pg-02', 'fft-b'],
'col3': ['pg-77', 'zzz-1', 'abc-03']})
print(df)
col1 col2 col3
0 abc-01 xyz-1 pg-77
1 fft-a pg-02 zzz-1
2 abc-02 fft-b abc-03
I need to keep only the rows that contain at least one abc
but at the same time don't contain any pg
. It means we should end up with only the row index 2
.
For that, I made the code below but I can't understand why the index 0
isn't dropped.
final = df.loc[df.apply(lambda x:(x.str.contains('abc')) & (~x.str.contains('pg'))).any(axis=1)]
print(final)
col1 col2 col3
0 abc-01 xyz-1 pg-77
2 abc-02 fft-b abc-03
Can you guys tell me what's wrong with my logic ?
My expected output is this :
col1 col2 col3
2 abc-02 fft-b abc-03
Solution
Solution
A possible solution, which fixes the logic of the OP's tried code (we need to use all
, because all row elements must be True
concerning the absence of pg
):
(df.loc[df.map(lambda x: ('abc' in x)).any(axis=1) &
df.map(lambda x: ('pg' not in x)).all(axis=1)])
Explanation about the reason why the OP's code does not work
The logic of the OP's code is that, for the first row, the first element will be True
and, therefore, the respective any
will also be True
, which makes the first row to appear in the result -- all
is indeed needed!
Output
col1 col2 col3
2 abc-02 fft-b abc-03
Answered By - PaulS
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.