Issue
I am trying to learn how to scrape data. I am very new to Python, so bare with me. Upon searching YouTube, I found a tutorial and tried to scrape some data off of "https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html"
from bs4 import BeautifulSoup
import requests
SCRAPE = requests.get("https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html")
print(SCRAPE)
#Response [200] = Succesful...
#http response status codes
#Information Responses 100-199
#Successful 200-299
#Redirects 300-399
#Client Errors 400-499
#Server Errors 500-599
soup = BeautifulSoup(SCRAPE.content, 'html.parser')
#tells that the data is html and we need to parse it
table = soup.find_all('div', class_="leaderboard leaderboard-table large" )
#pick the large section that contains all the info you need
#then, pick each smaller section, find the type and class.
for list in table:
name = list.find('div', class_="player-name-col")
position = list.find('td', class_="position")
total = list.find('td', class_="total")
print(name, position, total)
Above is my code.. I also included pictures with the inspect open so I can show you what I was thinking when I tried to find the type and class within the leaderboard.
When I print, nothing happens. Any help is appreciated!
Solution
Data is loaded dynamically by JavaScript and bs4 can't render JS that's why your code is printing nothing but you can pull the required data from API
.
Example:
import pandas as pd
import requests
api_url= 'https://lbdata.pgatour.com/2022/r/478/leaderboard.json?userTrackingId=eyJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2Njg5OTEzNTcsIm5iZiI6MTY2ODk5MTM1NywiZXhwIjoxNjY4OTkzMTU3fQ.eTvZpdJgVp5yzSQz4J8n8ovzaBnKPmLhZm6gfitKJeU'
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data=[]
res=requests.get(api_url,headers=headers)
#print(res)
for item in res.json()['rows']:
startRanks = item['total']
data.append({'total':startRanks})
df= pd.DataFrame(data)
print(df)
Output:
total
0 -18
1 -17
2 -15
3 -15
4 -14
5 -14
6 -13
7 -13
8 -11
9 -11
10 -11
11 -10
12 -10
13 -8
14 -8
15 -8
16 -7
17 -6
18 +1
19 +6
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.