Issue
I started off by pulling the page with Selenium and I believe I passed the part of the page I needed to BeautifulSoup correctly using this code:
soup = BeautifulSoup(driver.find_element("xpath", '//*[@id="version_table"]/tbody').get_attribute('outerHTML'))
Now I can parse using BeautifulSoup
query = soup.find_all("tr", class_=lambda x: x != "hidden*")
print (query)
My problem is that I need to dig deeper than just this one query. For example, I would like to nest this one inside of the first (so the first needs to be true, and then this one):
query2 = soup.find_all("tr", id = "version_new_*")
print (query2)
Logically speaking, this is what I'm trying to do (but I get SyntaxError: invalid syntax):
query = soup.find_all(("tr", class_=lambda x: x != "hidden*") and ("tr", id = "version_new_*"))
print (query)
How do I accomplish this?
Solution
As mentioned without any example it is hard to help or give a precise answer - However you could use a css selector
:
soup.select('tr[id^="version_new_"]:not(.hidden)')
Example
from bs4 import BeautifulSoup
html = '''
<tr id="version_new_1" class="hidden"></tr>
<tr id="version_new_2"></tr>
<tr id="version_new_3" class="hidden"></tr>
<tr id="version_new_4"></tr>
'''
soup = BeautifulSoup(html)
soup.select('tr[id^="version_new_"]:not(.hidden)')
Output
Will be a ResultSet
you could iterate to scrape what you need.
[<tr id="version_new_2"></tr>, <tr id="version_new_4"></tr>]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.