Issue
I have a string like so:
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
and I am trying to use re.sub to replace all special characters that are not apostrophes between letters with a space, so 'gluten-free' becomes gluten free and i'm will stay as i'm.
I have tried this:
import re
s = re.sub('[^[a-z]+\'?[a-z]+]', ' ', s)
which I am trying to say is to replace anything that is not following the pattern of one and more letters, with then 0 or one apostrophes, followed by one or more letters with white space.
this returns the same string:
i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.
I would like to have:
i'm sorry sir but this is a gluten free restaurant we don't serve bread
Solution
You may use this regex with a nested lookahead+lookbehind:
>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^\w\s]", ' ', s, flags=re.I) )
i'm sorry sir but this is a gluten free restaurant we don't serve bread
RegEx Details:
(?!
: Start negative lookahead(?<=[a-z])
: Positive lookbehind to assert that we have an alphabet at previous position'
: Match an apostrophe[a-z]
: Match letter[a-z]
)
: End negative lookahead[^\w\s]
: Match a character that is not a whitespace and not a word character
Answered By - anubhava
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.