Issue
Using the python requests
module, I tried to send requests to capitoltrades_politician_site. There are multiple pages of featured politicians, however when I send requests to these pages via ?page=2
or using the params argument, I always receive the first page back. I also tried adding per_page=1000
(to show all politicians in one page), this didn't work either
import bs4 as bs
import requests
import time
payload = {'page': 2,
'per_page': 100,}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'}
session = requests.Session()
r = session.get('https://www.capitoltrades.com/politicians',headers=headers, params=payload)
print(r.url)
soup = bs.BeautifulSoup(r.text, 'html.parser')
politicians = soup.find_all('a', class_='index-card-link')
print(len(politicians))
with open("test.html", "w") as file:
file.write(soup.prettify())
I also attempted to explicitly use the link with ?per_page=1000
(this works fine on chrome), but it continued to return the HTML of page one with 12 politicians. Any help is appreciated, would love to know if I'm doing something wrong with the requests
library, thanks!
Solution
The content is rendered dynamically and loaded from an API. Because requests
could only handle static response you have to focus on the API or instead use something like selenium
to mimic a browser.
Iterate the pages and use the JSON response to extract your data.
Example
import requests
import pandas as pd
page = 1
data = []
while True:
json_data = requests.get(f'https://bff.capitoltrades.com/politicians?per_page=96&page={page}&pageSize=96&metric=dateLastTraded&metric=countTrades&metric=countIssuers&metric=volume').json()
data.extend(json_data.get('data'))
if page == json_data.get('meta').get('paging').get('totalPages'):
break
else:
page = page+1
pd.DataFrame(data)
_politicianId | _stateId | party | partyOther | district | firstName | lastName | nickname | middleName | fullName | dob | gender | socialFacebook | socialTwitter | socialYoutube | website | chamber | committees | stats | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C001129 | ga | republican | 10 | Michael | Collins | Allen | Collins, Michael Allen Jr | 1967-07-02 | male | @MikeCollinsGA | https://collins.house.gov/ | house | ['hsii', 'hspw', 'hssy'] | {'dateLastTraded': '2024-01-08', 'countTrades': 6, 'countIssuers': 1, 'volume': 72500} | ||||
1 | M001135 | nc | democrat | 6 | Kathy | Manning | Ellen | Manning, Kathy Ellen | 1956-12-03 | female | @RepKManning | https://manning.house.gov/ | house | ['hsed', 'hsfa'] | {'dateLastTraded': '2024-01-01', 'countTrades': 583, 'countIssuers': 168, 'volume': 17567122} | ||||
... | |||||||||||||||||||
210 | G000579 | wi | republican | 8 | Michael | Gallagher | Mike | John | Gallagher, Michael John (Mike) | 1984-03-03 | male | RepMikeGallagher | @RepGallagher | https://gallagher.house.gov | house | ['hlig', 'hsas', 'hszs'] | {'dateLastTraded': '2021-03-25', 'countTrades': 1, 'countIssuers': 1, 'volume': 8000} | ||
211 | L000273 | nm | democrat | 3 | Teresa | Leger Fernandez | Isabel | Leger Fernandez, Teresa Isabel | 1959-07-01 | female | @RepTeresaLF | https://fernandez.house.gov/ | house | ['hsed', 'hsii', 'hsru'] | {'dateLastTraded': '2021-01-20', 'countTrades': 1, 'countIssuers': 1, 'volume': 32500} |
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.