Issue
I'm trying to capture the number of elements in a list using Beautiful Soup but I'm encountering an issue and getting a null array back. I'm pretty sure this used to work for me but not anymore.
I'd appreciate any help or pointers from the gurus out there as I'm sure there is a better way. I'm completely new to this and feel a bit lost.
So if we take a nested list like below with 3 elements:
<div class="row">
...
<div class="style_details">
<ul data-id="list" class="listing_details">
<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>
</ul>
</div>
and a snippet of code to count the list elements using the attribute 'class="listing_details"'
browser.get(url)
c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
dom = etree.HTML(str(soup))
data = soup.findAll('li',attrs={'class':'listing_details'})
links = len(data)
return links
Is the class being nested in an unordered list causing the issue? Any ideas how to overcome this or a better way to count items on the list?
Solution
If you want to select only direct children you can use next example:
from bs4 import BeautifulSoup
html_text = """\
<div class="row">
<div class="style_details">
<ul data-id="list" class="listing_details">
<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>
</ul>
</div>"""
soup = BeautifulSoup(html_text, "html.parser")
# print only direct <li> under <ul class="listing_details">
# note the " > " in the CSS selector
for li in soup.select("ul.listing_details > li"):
print(li)
Prints:
<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>
OR: Using bs4
API:
ul = soup.find("ul", class_="listing_details")
for li in ul.find_all("li", recursive=False):
print(li)
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.