Sunday, January 21, 2024

[FIXED] Question on conditional syntax for new col in pandas df

January 21, 2024 pandas, python No comments

Issue

I'm struggling to create a new col in a df based on an existing column using a condition. Essentially if the Client Contract Number contains an underscore, I want the value in the new column to be all characters before the underscore, otherwise I want it to be the Client Contract Number with all dashes removed. I'm able to remove the dashes with the below, but the second line doesn't work

raw_data_df['Search Text'] = raw_data_df['Client Contract Number'].str.replace('-','')

raw_data_df['Search Text'] = raw_data_df['Client Contract Number'].str.split('_')[0] if raw_data_df['Client Contract Number'].str.contains("_") else raw_data_df['Client Contract Number'].str.replace('-','')

Solution

You don't need to search for the _ explicitly, just extract the first part of the string (which is more efficient than split) with the (^[^_]+) pattern (all characters anchored to the left that are not _):

raw_data_df['Search Text'] = (raw_data_df['Client Contract Number']
                              .str.extract(r'(^[^_]+)', expand=False)
                              .str.replace('-', '')
                              )

Alternatively, a fix of your original approach using a list comprehension. Again, there is no need to explicitly test for the presence of _ since split would produce the same string if it's absent.

raw_data_df['Search Text'] = [s.split('_')[0].replace('-', '') for s in
                              raw_data_df['Client Contract Number']]

If really you need to test the presence of _ to have a different handling of the string (e.g., keep the - if there was no underscore), you could do something like:

raw_data_df['Search Text'] = [x[0]
                              if len(x:=s.split('_', maxsplit=1) == 1
                              else x[0].replace('-', '')
                              for s in
                              raw_data_df['Client Contract Number']]

And with an explicit check (which I believe might be less efficient):

raw_data_df['Search Text'] = [s.split('_', maxsplit=1)[0].replace('-', '')
                              if `_` in s else s
                              for s in raw_data_df['Client Contract Number']]

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 21, 2024

[FIXED] Question on conditional syntax for new col in pandas df

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels