Issue
I want to scrape speakers' name from this link: https://websummit.com/speakers
Name is basically in div tag with class="speaker__content__inner"
I made a spider in scrapy whos code is below
import scrapy
class Id01Spider(scrapy.Spider):
name = 'ID01'
allowed_domains = ['websummit.com']
start_urls = ['https://websummit.com/speakers']
def parse(self, response):
name=response.xpath('//div[@class = "speaker__content__inner"]/text()').extract()
for Speaker_Details in zip(name):
yield {'Speaker_Details': Speaker_Details.strip()}
pass
When I run this spider it runs and returns nothing. Log file: https://pastebin.com/JEfL2GBu
P.S: This is my first question on stackoverflow, so please correct my mistakes if I made any while asking.
Solution
If you check source HTML (using Ctrl+U
) you'll find that there is no speakers info inside HTML. This content is loaded dynamically using Javascript.
You need to call https://api.cilabs.com/conferences/ws19/lists/speakers?per_page=25
and parse JSON.
Answered By - gangabass
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.