Issue
I am using requests and Beautiful Soup to scrape some data from https://covid19.who.int/. Near the top of the website, there is a box containing numbers such as "new cases in last 24 hours", which is what I want to use. Upon inspecting the website, I found that it is stored in a div container with the class "sc-AxjAm sc-qQxXP hTCctY". However, when I try to get this element, it returns an empty list. Here is my code:
import requests
from bs4 import BeautifulSoup
r = requests.get(url='https://covid19.who.int')
soup = BeautifulSoup(r.text, 'lxml')
data = soup.find_all('div', class_='sc-AxjAm sc-qQxXP hTCctY')
print(data)
This code prints an empty list. Can someone help?
Solution
The information is built up in the browser via data retrieved in JSON requests. So it is all available, just not in the HTML returned.
Try the following:
import requests
req = requests.get('https://covid19.who.int/page-data/index/page-data.json')
data = req.json()
cases = data['result']['pageContext']['rawDataSets']['byDay']['rows'][-1]
print(f"New Cases in last 24hrs: {cases[6]:,}")
print(f"Cumulative cases: {cases[7]:,}")
print(f"Cumulative deaths: {cases[2]:,}")
This should give you:
New Cases in last 24hrs: 3,321,782
Cumulative cases: 364,191,494
Cumulative deaths: 5,631,457
The amount of information returned in the JSON is HUGE, so trying to find what you want will be a challenge. I would recommend you write the contents of req.text
to a text file and inspect that.
Answered By - Martin Evans
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.