Issue
Trying to pull the product name from a page:
https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html
Can't find XPATH which returns useful, specific result.
Apologies for my first post being such a beginner question :(
class V12Spider(scrapy.Spider):
name = 'v12'
start_urls = ['https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html']
def parse(self, response):
yield {
'price' : response.xpath('//span[@id="product-price-26901"]/text()'),
'name' : response.xpath('//h3[@class="product-name"]/a/text()'),
}
for name
, I expected to produce the name from items in h3
tags with class class product-name
but generates multiple rows of data='\r\n
(whilst we're at it for price
, is there any way to only pull the numerical values out?)
Solution
The problem you are facing can be solved using get() method for xpath and then using strip() method for string. I tried something like this
name= response.xpath('//h3[@class="product-name"]/a/text()').get()
Gives
'\r\n RED CHILLI VOLTAGE '
Then using
name.strip()
gives
'RED CHILLI VOLTAGE'
So you can replace your name statement with
name= response.xpath('//h3[@class="product-name"]/a/text()').get().strip()
Same solution to get price just add .get().strip at the end of your statement
Hopefully this helps. Also read about .getall() method from https://docs.scrapy.org/en/latest/topics/selectors.html
Answered By - glory9211
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.