Issue
I tested codes in Scrapy Shell and works fine.
fetch('https://www.livescores.com/?tz=3')
response.css('div.dh')
gununMaclari = response.css('div.dh')
gununMaclari.css('span.hh span.ih span.kh::text').get()
gununMaclari.css('span.hh span.jh span.kh::text').get()
These commands show me home and away teams. If i use getall()
I can reach all data for both home and away.
But when I run below code, the output is empty. HAt is the problem I could not solve it. Could someone help me to find the problem? Thanks.
import scrapy
from scrapy.crawler import CrawlerRunner
class LivescoresTodayList(scrapy.Spider):
name = 'todayMatcheslist'
custom_settings = {'CONCURRENT_REQUESTS': '1'}
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/?tz=3')
def parse(self, response):
for gununMaclari in response.css('div.dh'):
yield{
'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
}
runnerTodayList = CrawlerRunner(settings = {
"FEEDS": {
"todayMatcheslist.json": {"format": "json", "overwrite": True},
},
})
runnerTodayList.crawl(LivescoresTodayList)
Solution
Read this.
The spider itself is fine. If you're using CrawlerRunner
you need to configure the logging and settings, and start the reactor.
Example with CrawlerProcess:
import scrapy
from scrapy.crawler import CrawlerProcess
class LivescoresTodayList(scrapy.Spider):
name = 'todayMatcheslist'
custom_settings = {'CONCURRENT_REQUESTS': '1'}
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/?tz=3')
def parse(self, response):
for gununMaclari in response.css('div.dh'):
yield{
'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
}
process = CrawlerProcess(settings={
"FEEDS": {
"todayMatcheslist.json": {"format": "json", "overwrite": True},
},
})
process.crawl(LivescoresTodayList)
process.start()
Example with CrawlerRunner:
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from twisted.internet import reactor
class LivescoresTodayList(scrapy.Spider):
name = 'todayMatcheslist'
custom_settings = {'CONCURRENT_REQUESTS': '1'}
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/?tz=3')
def parse(self, response):
for gununMaclari in response.css('div.dh'):
yield{
'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
}
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runnerTodayList = CrawlerRunner(settings={
"FEEDS": {
"todayMatcheslist.json": {"format": "json", "overwrite": True},
},
})
d = runnerTodayList.crawl(LivescoresTodayList)
d.addBoth(lambda _: reactor.stop())
reactor.run()
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.