Issue
I want go get all bullet points with scrapy from an amazon product page e.g. Amazon link, however their number varies. I end up using something like this
def parse(self, response):
t = response
url = t.request.url
yield{
'bullets_no': len(t.xpath('//div[@id="feature-bullets"]//li/span/text()'))
'bullet_1' : t.xpath('//div[@id="feature-bullets"]//li/span/text()')[0].get().strip()
'bullet_2' : t.xpath('//div[@id="feature-bullets"]//li/span/text()')[1].get().strip()
'bullet_3' : t.xpath('//div[@id="feature-bullets"]//li/span/text()')[2].get().strip()
'bullet_4' : t.xpath('//div[@id="feature-bullets"]//li/span/text()')[3].get().strip()
'bullet_5' : t.xpath('//div[@id="feature-bullets"]//li/span/text()')[4].get().strip()
...
}
however in pythong i would be able to simply do something like this and adjust automatically:
bullets = t.xpath('//div[@id="feature-bullets"]//li/span/text()')
for i, bullet in enumerate(bullets):
row[f'Bullet_{i+1}'] = bullet.strip()
Is it possible to create yielded fields like this in scrapy?
Solution
Yes, this is covered in detail in the scrapy tutorial which I highly suggest reading.
The return type when using either the response.css
or response.xpath
calls is a SelectorList
object. You can iterate this object like you can a regular python list
object.
The result of running response.css('title') is a list-like object called SelectorList, which represents a list of Selector objects that wrap around XML/HTML elements and allow you to run further queries to fine-grain the selection or extract the data.
So using your example you could do something like this:
def parse(self, response):
item = {'url': response.url}
for i, bullet in enumerate(response.xpath('//div[@id="feature-bullets"]//li/span/text()'), start=1):
item[f'bullet_{i}'] = bullet.get().strip()
item['bullet_no'] = i
yield item
As mention in a previous answer there is also the getall
method that you can call on a selector list:
The other thing is that the result of calling .getall() is a list: it is possible that a selector returns more than one result, so we extract them all.
I suggest giving the Extracting Data and Extracting Quotes and Authors sections of the scrapy docs tutorial a read to find out more.
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.