Issue
I'm struggling to create a new col in a df based on an existing column using a condition. Essentially if the Client Contract Number contains an underscore, I want the value in the new column to be all characters before the underscore, otherwise I want it to be the Client Contract Number with all dashes removed. I'm able to remove the dashes with the below, but the second line doesn't work
raw_data_df['Search Text'] = raw_data_df['Client Contract Number'].str.replace('-','')
raw_data_df['Search Text'] = raw_data_df['Client Contract Number'].str.split('_')[0] if raw_data_df['Client Contract Number'].str.contains("_") else raw_data_df['Client Contract Number'].str.replace('-','')
Solution
You don't need to search for the _
explicitly, just extract
the first part of the string (which is more efficient than split
) with the (^[^_]+)
pattern (all characters anchored to the left that are not _
):
raw_data_df['Search Text'] = (raw_data_df['Client Contract Number']
.str.extract(r'(^[^_]+)', expand=False)
.str.replace('-', '')
)
Alternatively, a fix of your original approach using a list comprehension. Again, there is no need to explicitly test for the presence of _
since split would produce the same string if it's absent.
raw_data_df['Search Text'] = [s.split('_')[0].replace('-', '') for s in
raw_data_df['Client Contract Number']]
If really you need to test the presence of _
to have a different handling of the string (e.g., keep the -
if there was no underscore), you could do something like:
raw_data_df['Search Text'] = [x[0]
if len(x:=s.split('_', maxsplit=1) == 1
else x[0].replace('-', '')
for s in
raw_data_df['Client Contract Number']]
And with an explicit check (which I believe might be less efficient):
raw_data_df['Search Text'] = [s.split('_', maxsplit=1)[0].replace('-', '')
if `_` in s else s
for s in raw_data_df['Client Contract Number']]
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.