Issue
I'm trying to scrape words and their meaning from the word list on this this page using bs4
and selenium
, although I'm not sure how I can loop through the <tr>
and <td>
tags after I get the table html from the bs4
find_all
method:
from selenium import webdriver
from bs4 import BeautifulSoup
root = "https://www.graduateshotline.com/gre-word-list.html"
driver.get(root)
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
table = soup.find_all('table',attrs={'class': 'tablex border1'})[0]
Now in the table variable I have the html for the whole table, here's a snippet from the start and end:
<table class="tablex border1"> <tbody><tr><td><a href="https://gre.graduateshotline.com/a.pl?word=introspection" target="_blank">introspection</a></td>
<td>examining one's own thoughts and feelings</td></tr>
<tr><td><a href="https://gre.graduateshotline.com/a.pl?word=philanthropist" target="_blank">philanthropist</a></td>
.
.
.
<tr><td><a href="https://gre.graduateshotline.com/a.pl?word=refine" target="_blank">refine</a></td>
<td>make or become pure cultural </td></tr>
</tbody></table>
I'm not sure how I can access the words and their meanings using it. Any ideas?
Solution
You want to iterate through all the tablerows, and pull out the text from a pair of td elements.
for row in table.find_all("tr"):
tds = row.find_all("td")
print(f"{tds[0].text}: {tds[1].text}")
...
repel: refuse to accept/cause dislike
superimpose: put something on the top
centurion: leader of a unit of 100 soldiers
For what its worth, you can use python-requests
to get a webpage's content without spinning up a browser:
import requests
content = requests.get(root).text
Answered By - TankorSmash
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.