Issue
I am using Scrapy to collect data from a cinema webpage.
Working with the XPath selectors, if I use the selectors with the extract() method, as such:
def parse_with_extract(self, response):
div = response.xpath("//div[@class='col-sm-7 col-md-9']/p[@class='movie__option']")
data = i.xpath("text()").extract()
return data
It returns:
If I use the selector with the extract_first() method as such:
def parse_with_extract_first(self, response):
div = response.xpath("//div[@class='col-sm-7 col-md-9']/p[@class='movie__option']")
storage = []
for i in div:
data = i.xpath("text()").extract_first()
storage.append(data)
return storage
It returns:
Why is the extract() method returning all characters, including the "\xa0", and the extract_first() method returning an empty string instead?
Solution
If you look closer at the response, you'll see that @class=movie__option
element looks like this:
'<p class="movie__option" style="color: #000;">\n <strong>Thursday 3rd of May 2018:</strong>\n 11:20am\xa0 \xa0 \n </p>'
If you extract text()
of this element you basically get two strings: one which is before strong
tag and one which is after (text()
takes only first-level text):
['\n ',
'\n 11:20am\xa0 \xa0 \n ']
What extract_first
does is just taking the first of these two strings:
'\n '
Answered By - stasdeep
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.