Issue
I've got some HTML code which looks like this:
<p>Blah blah blah...</p>
<video controls>
<source src="video.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<p>Blah blah blah...</p>
<img src="img.png">
<p>Blah blah blah...</p>
What I've been trying to do is develop a Python script to wrap each p
element (and only each p
element) in a span, then insert an a
element before the p
, so the result will look like this:
<span class="anchors" id="p1">
<a class="anchor" href=".#p1">1</a>
<p>Blah blah blah...</p>
</span>
<video controls>
<source src="video.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<span class="anchors" id="p2">
<a class="anchor" href=".#p2">2</a>
<p>Blah blah blah...</p>
</span>
<img src="img.png">
<span class="anchors" id="p3">
<a class="anchor" href=".#p3">3</a>
<p>Blah blah blah...</p>
</span>
I've been messing around with BeautifulSoup for a while now, just trying to get the span
elements in there first. This is the relevant section of the code I've come up with, after many even less successful iterations:
anchor = 1
soup = bs4.BeautifulSoup(content5, "html.parser")
for sibling in soup.p.next_siblings:
sibling.p.wrap(soup.new_tag("span", **{'class': 'anchors'}, id="p"+str(anchor)))
anchor = anchor + 1
anchor = 1
Running this generates this error:
Traceback (most recent call last):
File "C:/Users/James/PycharmProjects/Project Name/main.py", line 121, in <module>
for sibling in soup.p.next_siblings:
AttributeError: 'NoneType' object has no attribute 'next_siblings'
What I'm assuming this means is that, for some reason, the code I've written isn't properly "locking on" to the p
elements, so to speak. I'm at a loss for what to do here, as several different versions of this code have all produced the same error.
Has anyone else had this same problem, and if so, how did you get it resolved?
Solution
You can use the insert_before()
method in addition to wrap()
to also add the a
tag.
from bs4 import BeautifulSoup
html = """
<p>Blah blah blah...</p>
<video controls>
<source src="video.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<p>Blah blah blah...</p>
<img src="img.png">
<p>Blah blah blah...</p>
"""
soup = BeautifulSoup(html, "html.parser")
anchor = 1
print("BEFORE")
print("-" * 30)
print(soup.prettify())
for tag in soup.find_all("p"):
new_span = soup.new_tag("span", **{"class": "anchors"}, id=f"p{anchor}")
tag.wrap(new_span)
new_a = soup.new_tag("a", **{"class": "anchors"}, href=f".#p{anchor}")
new_a.string = str(anchor)
tag.insert_before(new_a)
anchor += 1
print("-" * 30)
print(f"AFTER:\n\n {soup.prettify()} ")
Output:
BEFORE
------------------------------
<p>
Blah blah blah...
</p>
<video controls="">
<source src="video.mp4" type="video/mp4"/>
Your browser does not support the video tag.
</video>
<p>
Blah blah blah...
</p>
<img src="img.png"/>
<p>
Blah blah blah...
</p>
------------------------------
AFTER:
<span class="anchors" id="p1">
<a class="anchors" href=".#p1">
1
</a>
<p>
Blah blah blah...
</p>
</span>
<video controls="">
<source src="video.mp4" type="video/mp4"/>
Your browser does not support the video tag.
</video>
<span class="anchors" id="p2">
<a class="anchors" href=".#p2">
2
</a>
<p>
Blah blah blah...
</p>
</span>
<img src="img.png"/>
<span class="anchors" id="p3">
<a class="anchors" href=".#p3">
3
</a>
<p>
Blah blah blah...
</p>
</span>
Answered By - MendelG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.