Issue
I am trying to parse this javascript element using BS4. I want to get that input array into a usable format.
<script type="text/javascript">
require.config.params['matchheader'] = {
input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']
,
matchId: 1640674
};
</script>
To get the text inside the global variable, I used the following regex:
re.search("input: \[.*?\]", script_element.string).group(0)
which returns:
"input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']"
I am having some trouble parsing this array because of the empty elements (literal_eval
does not work).
Any idea on how to accomplish this? Is there an easier way to do it?
Regards
Solution
One solution could be insert None
between the empty ,
and then parse it:
import re
from ast import literal_eval
data = re.search(r"input:\s*(.*)", s).group(1) # <-- `s` is your string from the question
data = re.sub(r"(?<=,)\s*(?=,)", "None", data)
data = literal_eval(data)
print(data)
Prints:
[162, 13, 'Crystal Palace', 'Arsenal', '05/08/2022 20:00:00', '05/08/2022 00:00:00', 6, 'FT', '0 : 1', '0 : 2', None, None, '0 : 2', 'England', 'England']
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.