Issue
I have the following function that takes in a url and finds the first table and its rows (tr
):
def get_team_table(url):
page = urlopen(url)
soup = BeautifulSoup(page, 'lxml')
data_rows = [row for row in soup.find("table", "datatable").find_all("tr")]
return data_rows
The function is used below to pull the table rows from tables present in each url. links_all
is the list of urls to iterate over, and the output from the function is appended to team_data
.
team_data = []
for link in links_all:
team_data.append(get_team_table(link))
However, I would also like to the pull the title (title
tag) of the table as well. And I would like to have each title
as a list item that comes immediately before each tr
list item. This code doesn't work, but it might make my idea more clear.
def get_team_table(url):
page = urlopen(url)
soup = BeautifulSoup(page, 'lxml')
table_title = [row for row in soup.find("title")]
return table_title
data_rows = [row for row in soup.find("table", "datatable").find_all("tr")]
return data_rows
Once this is done, I'd like to append each table title to each row of table data. I have code that finds all td
tags and appends to a new list. So is there a way to add onto this so that the table title and table data could be added as a single list item in table_data.
table_data = []
#A nested for loop because inner items of the list are BeautifulSoup Elements
for rows in team_data:
for row in rows:
if soup.find_all("td", class_ = "right xs-hide") is not None:
table_data.append(row.get_text())
An example of a single list item looks like this:
\nPeterson\nPatrick Peterson\n\nCB\n$12,050,000\n-\n$250,000\n-\n$250,000\n$634,588\n-\n($13,184,588)\n$13,184,588 \n6.71\n
But I'm hoping with the additional of the table title, that the result would look like:
\nArizona Cardinals 2020 Salary Cap\nPeterson\nPatrick Peterson\n\nCB\n$12,050,000\n-\n$250,000\n-\n$250,000\n$634,588\n-\n($13,184,588)\n$13,184,588 \n6.71\n
Solution
If I understand you correctly, you can read the table title with .find_previous()
method:
url = "https://www.spotrac.com/nfl/arizona-cardinals/cap/2020/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
def read_table(t):
all_rows = []
for row in t.select("tr:has(td)"):
tds = [td.get_text(strip=True, separator=", ") for td in row.select("td")]
all_rows.append(tds)
return all_rows
for table in soup.select("table.datatable"):
title = table.find_previous(["h1", "h2"])
title = title.text if title else "-"
rows = read_table(table)
print(title)
# print only first row
print(*rows[0], sep="\t")
print("-" * 80)
Prints:
Cardinals 2020 Salary Cap
Peterson, Patrick Peterson CB $12,050,000 - $250,000 - $250,000 $634,588 - ($13,184,588) $13,184,588 6.71
--------------------------------------------------------------------------------
2020 Exempt/Commissioner’s Permission List
Gilbert, Marcus Gilbert, (COVID-19) RT - - - - - - - - - 0.00
--------------------------------------------------------------------------------
2020 Injured Reserve Cap
Jones, Chandler Jones, (Biceps) OLB $16,000,000 $3,000,000 - - - $2,333,333 - ($26,666,667) $21,333,333 10.86
--------------------------------------------------------------------------------
2020 Practice Squad
Amukamara, Prince Amukamara CB $144,000 - - - - - - - $144,000 0.07
--------------------------------------------------------------------------------
2020 Dead Cap
Johnson, David Johnson RB - $6,000,000 - - - - - - $6,000,000 3.05
--------------------------------------------------------------------------------
2020 Cap Totals
2020 NFL Salary Cap $198,200,000
--------------------------------------------------------------------------------
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.