Friday, October 14, 2022

[FIXED] My Scrapy Shell Commands Work but Output is Empty

October 14, 2022 scrapy, web-scraping No comments

Issue

I tested codes in Scrapy Shell and works fine.

fetch('https://www.livescores.com/?tz=3')
response.css('div.dh')
gununMaclari = response.css('div.dh')
gununMaclari.css('span.hh span.ih span.kh::text').get()
gununMaclari.css('span.hh span.jh span.kh::text').get()

These commands show me home and away teams. If i use getall() I can reach all data for both home and away. But when I run below code, the output is empty. HAt is the problem I could not solve it. Could someone help me to find the problem? Thanks.

import scrapy
from scrapy.crawler import CrawlerRunner

class LivescoresTodayList(scrapy.Spider):

    name = 'todayMatcheslist'
    custom_settings = {'CONCURRENT_REQUESTS': '1'}

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/?tz=3')

    def parse(self, response):

        for gununMaclari in response.css('div.dh'):
            yield{
                'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
                'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
            }

runnerTodayList = CrawlerRunner(settings = {
    "FEEDS": {
        "todayMatcheslist.json": {"format": "json", "overwrite": True},
    },
})
runnerTodayList.crawl(LivescoresTodayList)

Solution

Read this.

The spider itself is fine. If you're using CrawlerRunner you need to configure the logging and settings, and start the reactor.

Example with CrawlerProcess:

import scrapy
from scrapy.crawler import CrawlerProcess


class LivescoresTodayList(scrapy.Spider):
    name = 'todayMatcheslist'
    custom_settings = {'CONCURRENT_REQUESTS': '1'}

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/?tz=3')

    def parse(self, response):
        for gununMaclari in response.css('div.dh'):
            yield{
                'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
                'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
            }


process = CrawlerProcess(settings={
    "FEEDS": {
        "todayMatcheslist.json": {"format": "json", "overwrite": True},
    },
})

process.crawl(LivescoresTodayList)
process.start()

Example with CrawlerRunner:

import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from twisted.internet import reactor


class LivescoresTodayList(scrapy.Spider):
    name = 'todayMatcheslist'
    custom_settings = {'CONCURRENT_REQUESTS': '1'}

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/?tz=3')

    def parse(self, response):
        for gununMaclari in response.css('div.dh'):
            yield{
                'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
                'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
            }


configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runnerTodayList = CrawlerRunner(settings={
    "FEEDS": {
        "todayMatcheslist.json": {"format": "json", "overwrite": True},
    },
})
d = runnerTodayList.crawl(LivescoresTodayList)
d.addBoth(lambda _: reactor.stop())
reactor.run()

Answered By - SuperUser

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, October 14, 2022

[FIXED] My Scrapy Shell Commands Work but Output is Empty

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels