Issue
I'm using Scrapy to scrape this website. I want to grab all the div elements with class="data1". I'm using css and xpath selectors to do so. However, I cannot find these elements using css and xpath selectors even though I can see them in the html code in the browser.
In the scrapy shell after fetching the url:
In [6]: response.css('div#my_div')
Out[6]: [<Selector query="descendant-or-self::div[@id = 'my_div']" data='<div id="my_div">Results will be show...'>]
In [7]: response.css('div#my_div div')
Out[7]: []
In [8]: response.xpath('//div[@class="data1"]')
Out[8]: []
The html looks something like this:
<div id="my_div" style>
<div class="data1">...</div>
<div class="data1">...</div>
<div class="data1">...</div>
...
</div>
Solution
This is because that portion of the site is rendered with javascript. You can see this if you were to call .get()
on your first query in your example:
In [1]: response.css('div#my_div').get()
Out[1]: '<div id="my_div">Results will be shown here.</div>'
If you investigate by looking in the network tab of the browser developer tools you can discover that all that information is coming from an api call to 'https://data.crn.com/2023/wotc2023.php?st1=1&st2=a'
which when fetched via scrapy shell returns a json
object with all the information in that section.
In [3]: fetch('https://data.crn.com/2023/wotc2023.php?st1=1&st2=a')
2023-05-08 20:57:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://data.crn.com/2023/wotc2023.php?st1=1&st2=a> (referer: None)
In [4]: response.json()
Out[4]:
[{'Pkey': '617',
'Company': 'F5',
'Name_First': 'Barbara',
'Name_Last': 'Abboud',
'Image': 'f5-abboud-barbara.jpg'},
{'Pkey': '1208',
'Company': 'Samsung Electronics America',
'Name_First': 'Shpresa',
'Name_Last': 'Abdullaj',
'Image': 'samsung-electronics-america-abdullaj-shpresa.jpg'},
{'Pkey': '499',
'Company': 'Davenport Group',
'Name_First': 'Kim',
'Name_Last': 'Abrams',
'Image': 'davenport-group-abrams-kim.jpg'},
{'Pkey': '35',
'Company': 'Alteryx',
'Name_First': 'Daniella',
'Name_Last': 'Aburto Valle',
'Image': 'alteryx-aburto-valle-daniella.jpg'},
.......]
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.