Issue
I'm trying to crawl a webpage using Scrapy and XPath. Here are my code and logs, can someone help me. Thanks in advance!
from scrapy import Spider
from scrapy.selector import Selector
from crawler.items import CrawlerItem
class CrawlerSpider(Spider):
name = "crawler"
allowed_domains = ["dayhoctienganh.net"]
start_urls = [
"https://dayhoctienganh.net/trac-nghiem-tieng-anh-trinh-do-b",
]
def parse(self, response):
questions = Selector(response).xpath('//ol[@class="questions"]/li')
for question in questions:
item = CrawlerItem()
item['quest']= question.xpath('/h3/text()').extract_first()
item['sela']= question.xpath('/ul[@class="answers"]/li[1]/label/text()').extract_first()
item['selb']= question.xpath('/ul[@class="answers"]/li[2]/label/text()').extract_first()
item['selc']= question.xpath('/ul[@class="answers"]/li[3]/label/text()').extract_first()
item['seld']= question.xpath('/ul[@class="answers"]/li[4]/label/text()').extract_first()
item['key']= question.xpath('/ul[@class="responses"]/li[2]/text()').extract_first()
yield item
2019-11-16 23:53:53 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-11-16 23:53:53 [scrapy.core.engine] INFO: Spider opened
2019-11-16 23:53:53 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-11-16 23:53:53 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-11-16 23:53:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://dayhoctienganh.net/trac-nghiem-tieng-anh-trinh-do-b> (referer: None)
2019-11-16 23:53:55 [scrapy.core.engine] INFO: Closing spider (finished)
Solution
If you open the source of the start_urls
using ctrl/cmd + U, you will be unable to find questions
class and questions
list will be empty, which results in skipping the for loop in parse method and thus you are not getting your desired results. Moreover answers
is also not available in the source of the webpage as well. Thus all fields of item
will empty as well.
Answered By - Ikram Khan Niazi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.