Issue
I have a DataFrame column that looks like this:
df = pd.DataFrame([["f1S_a_Ea5S_b_c_E45S_Ezff"], ["15S_a_Ey2S_b_c_Ex3S_Ez4f"], ["c3S_a_EghS_b_c_EpoS_Eqp"]], columns=["text"])
For each string, I want to remove the part between "S" and "E", so my expected output is:
text cleaned
f1S_a_Ea5S_b_c_E45S_Ezff f1 a5 45 zff
15S_a_Ey2S_b_c_Ex3S_Ez4f 15 y2 x3 z4f
c3S_a_EghS_b_c_EpoS_Eqp c3 gh po qp
I looked into other SO answers but they usually deal with replacing values in the whole cell.
I tried split
but since there may be other characters between "S" and "E", df['text'].str.split('S_E')
doesn't exactly work.
I think I should use regex here but I'm not good with regex. Can anyone help me here?
Solution
IIUC, you can use str.replace
with pattern S[^E]+E
since you want to catch all characters between S
and E
and replace it with " "
:
df['cleaned'] = df['text'].str.replace(r"S[^E]+E", " ", regex=True)
Output:
text cleaned
0 f1S_a_Ea5S_b_c_E45S_Ezff f1 a5 45 zff
1 15S_a_Ey2S_b_c_Ex3S_Ez4f 15 y2 x3 z4f
2 c3S_a_EghS_b_c_EpoS_Eqp c3 gh po qp
Answered By - enke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.