Issue
I have the following string (file):
s = '''
\newcommand{\commandName1}{This is first command}
\newcommand{\commandName2}{This is second command with {} brackets inside
in multiple lines {} {}
}
\newcommand{\commandName3}{This is third, last command}
'''
Now I would like to use Python re
package to extract the data to dictionary where key
is the command name (\commandName1
, \commandName2
and \commandName3
) and the values are the This is first command
, This is second command with {} brackets inside in multiple lines {} {}
and This is third, last command
. I tried sth like:
re.findall(r'\\newcommand{(.+)}{(.+)}', s)
but it doesnt work because second command has {}
inside. What is the easiest way to do that?
Solution
You may use this regex:
(?s)\\newcommand{([^}]+)}{(.+?)}(?=\s*(?:\\newcommand|$))
RegEx Breakdown:
(?s)
: Enable DOTALL (single line) mode\\newcommand
:{
: Match a{
([^}]+)
: Match 1+ of any characters that are not{
in capture group #1}
: Match a}
{
: Match a{
(.+?)
: Match 1+ of any characters in capture group #2}
: Match a}
(?=\s*(?:\\newcommand|$))
: Lookahead to assert presence of 0 or more whitespace and\newcommand
or else end of input.
Code:
import re
s = r'''
\newcommand{\commandName1}{This is first command}
\newcommand{\commandName2}{This is second command with {} brackets inside
in multiple lines {} {}
}
\newcommand{\commandName3}{This is third, last command}
'''
print (re.findall(r'(?s)\\newcommand{([^}]+)}{(.+?)}(?=\s*(?:\\newcommand|$))', s))
Answered By - anubhava
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.