Issue
I'm using BeautifulSoup
to webscrape game data off of metacritic. I'm trying to get the scores and text for each reviewer. I thought that everything was going well but when I get the response back I see stuff like this:
class="c-siteReviewPlaceholder_header"
The site does not have the word placeholder in its classes. I know that I need to target the specific class:
class_="c-pageProductReviews_row"
So this is what my code looks like:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.metacritic.com/game/alien-isolation/critic-reviews/?
platform=playstation-4'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
critic_review_page = requests.get(URL, headers=headers)
soup = BeautifulSoup(critic_review_page.content, "html.parser")
critic_review_rows = soup.find_all("div", class_="c-pageProductReviews_row")
print(critic_review_rows)
When I print critic_review_rows
I see a lot of the classes have the word placeholder in them. I don't know if Metacritic won't let me scrape the site or what is going on. It's almost as if the data is not loading the data by the time I'm scraping it.
Solution
Main issue here is that the content is rendered dynamically by javascript
, something that requests
do not handle, because it is not acting like a browser and only work with the first static state of response.
The initial state is stored in script at the end of the page source, so you could extract it, but a better approach would be to use the api that is called:
import requests
url = 'https://fandom-prod.apigee.net/v1/xapi/reviews/metacritic/critic/games/alien-isolation/platform/playstation-4/web?apiKey=1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u&offset=0&limit=50&sort=score&componentType=ReviewList'
requests.get(url).json()
{'data': {'id': 1400262756,
'totalResults': 50,
'items': [{'quote': 'The permanent threat of death keeps you forced to the ground – we can’t remember the last game where we willingly snuck around so much – and it feels like the claustrophobic corridors and catwalks of the Sevastopol were built from that angle. Being crouched, looking up at everything… it really does give you that feeling that Creative Assembly wanted it all along – that ‘prey being hunted’ effect. It makes a refreshing change from the feeling of being overpowered and able to kill anything that appears.',
'score': 90,
'url': 'http://www.play-mag.co.uk/reviews/ps4-reviews/alien-isolation-review-2/',
'date': '2014-10-03',
'author': None,
'authorSlug': None,
'image': None,
'publicationName': 'Play UK',
'publicationSlug': 'play-uk',
'reviewedProduct': {'id': 1400262756,
'type': 'games',
'title': 'Alien: Isolation',
'url': '/game/alien-isolation/',
'criticScoreSummary': {'url': '/game/alien-isolation/critic-reviews/?platform=playstation-4',
'score': 79},
'platform': {'id': 1500000006, 'name': 'PlayStation 4'},
'gameTaxonomy': {'game': {'id': 1400262756, 'name': 'Alien: Isolation'},
'platform': {'id': 1500000006, 'name': 'PlayStation 4'}}},
'platform': 'PlayStation 4'},...
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.