Issue
I'm working with string columns on a pandas dataframe and would like to check if I can check if one of 2 columns contains the string in the match column (& check both ways)
Column1 | Column2 | Match_Column
---------------------------------------------
Customer1 | Customer1 | Customer1 LLC
| Customer2 | Customer2 LLC
Customer3 | | Customer3 LLC
| Customer4 LLC | Customer4
Customer5 LLC | | Customer5
Customer6 LLC | | Customer8
| Customer9 LLC | Customer4
The expected output is as follows
Column1 | Column2 | Match_Column | is_Match
----------------------------------------------------------
Customer1 | Customer1 | Customer1 LLC | Yes
NaN | Customer2 | Customer2 LLC | Yes
Customer3 | NaN | Customer3 LLC | Yes
NaN | Customer4 LLC | Customer4 | Yes
Customer5 LLC | NaN | Customer5 | Yes
Customer6 LLC | NaN | Customer8 | No
NaN | Customer9 LLC | Customer4 | No
NaN | NaN | Customer4 | No
I tried this approach, but did not seem to work as expected
df.loc[(df['Column1'].isin(df['Match_Column'])) |
(df['Match_Column'].isin(df['Column1'])) |
(df['Match_Column'].isin(match_df['Column2'])) |
(df['Column2'].isin(df['Match_Column'])),'is_Match'] = 'Yes'
Solution
Use numpy.where
with in
statement for test per rows:
df['is_Match'] = np.where([(a in c) or (b in c) or (c in a) or (c in b) for a,b,c
in zip(df['Column1'].fillna('_'), df['Column2'].fillna('_'),
df['Match_Column'].fillna('nodata'))],
'Yes', 'No')
print (df)
Column1 Column2 Match_Column is_Match
0 Customer1 Customer1 Customer1 LLC Yes
1 NaN Customer2 Customer2 LLC Yes
2 Customer3 NaN Customer3 LLC Yes
3 NaN Customer4 LLC Customer4 Yes
4 Customer5 LLC NaN Customer5 Yes
5 Customer6 LLC NaN Customer8 No
6 NaN Customer9 LLC Customer4 No
7 NaN NaN Customer4 No
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.