Issue
import scrapy
from scrapy.crawler import CrawlerRunner
class Livescores2(scrapy.Spider):
name = 'Home'
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/football/turkey/super-lig/?tz=3&table=league-home')
def parse(self, response):
for total in response.css('td'):
yield{
'total': total.css('::text').get()
}
runner2 = CrawlerRunner()
runner2.crawl(Livescores2)
When i adjust settings like below, i can save the data as json without a problem.
runner2 = CrawlerRunner(settings = {
"FEEDS": {
"Home.json": {"format": "json", "overwrite": True},
},
})
I want to assign the returned Scrapy data to a Variable so i can work on it. I don't want any Json data!
I tried:
import scrapy
from scrapy.crawler import CrawlerRunner
class Livescores2(scrapy.Spider):
name = 'Home'
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/football/turkey/super-lig/?tz=3&table=league-home')
def parse(self, response):
for total in response.css('td'):
yield{
'total': total.css('::text').get()
}
runner2 = CrawlerRunner()
a = runner2.crawl(Livescores2)
print(a)
Result is: <Deferred at 0x65cbfb6d0>
How can i reach the data from a variable? I develop a Android app so i don't need any Json file. I don't know how to use "return" function on this code.
Thanks very much
Solution
You can simply create a class attribute that stores the data
, and then access it once the spider has completed processing all of the requests. This isn't really the workflow that the scrapy
framework targets though, and there are likely other web-scraping tools that could handle this more intuitively.
for example:
import scrapy
from scrapy.crawler import CrawlerRunner
class Livescores2(scrapy.Spider):
name = 'Home'
data = [] # data attribute
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/football/turkey/super-lig/?tz=3&table=league-home')
def parse(self, response):
for total in response.css('td'):
item = {'total': total.css('::text').get()}
self.data.append(item) # append item to data list
yield item
runner2 = CrawlerRunner()
a = runner2.crawl(Livescores2)
print(Livescores2.data) # print the collected data
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.