Issue
My scrapy spider looks through a csv file and runs start_urls with the address in the csv file like so:
from csv import DictReader
with open('addresses.csv') as rows:
start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]
But the .csv file also contains emails and other information. How can I pass this extra information into the parse to add it to the new file?
import scrapy
from csv import DictReader
with open('addresses.csv') as rows:
names=[row["Name"].replace(',','') for row in DictReader(rows)]
emails=[row["Email"].replace(',','') for row in DictReader(rows)]
start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]
def parse(self,response):
yield{
'name': FROM CSV,
'email': FROM CSV,
'address' FROM SCRAPING:
'city' FROM SCRAPING:
}
Solution
import scrapy
from csv import DictReader
class MySpider(scrapy.Spider):
def start_requests(self):
with open('addresses.csv') as rows:
for row in DictReader(rows):
name=row["Name"].replace(',','')
email=row["Email"].replace(',','')
link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+')
yield Request(url = link,
callback = self.parse,
method = "GET",
meta={'name':name, 'email':email}
)
def parse(self,response):
yield{
'name': resposne.meta['name'],
'email': respose.meta['email'],
'address' FROM SCRAPING:
'city' FROM SCRAPING:
}
- Open your CSV file.
- Iterate over it inside
start_requests
method. - Pass parameters to callback function, use
meta
variable, you can pass a Python Dictionary inmeta
.
Note:
Remember that start_requests
is not my custom defined method, its Python Scrapy's method. See https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests
Answered By - Umair Ayub
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.