Issue
so my projects seem to keep failing for the same reason. I get syntax error. I'm using anaconda and visual code studio, I have the environment setup correctly, i think*.
The code i'm using is the following:
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class BestMoviesSpider(CrawlSpider):
name = 'best_movies'
allowed_domains = ['imdb.com']
start_urls = ['https://www.imdb.com/chart/top']
rules = (
Rule(LinkExtractor(restrict_xpaths="//td[@class='titleColumn']/a"), callback='parse_item', follow=True),
)
def parse_item(self, response):
yield {
'title': response.xpath("//h1/text()").get(),
'year': response.xpath("//li[@class="ipc-inline-list__item"]/span/text()").get(),
'duration': response.xpath("(//li[@class="ipc-inline-list__item"])[3]/text()").get(),
'genre': response.xpath("//span[@class="ipc-chip__text"]/text()").get(),
'rating': response.xpath("//span[@class="AggregateRatingButton__RatingScore-sc-1ll29m0-1 iTLWoV"]/text()").get(),
'movie_url': response.url,
}
The error I'm getting is : line 18 'year': response.xpath("//li[@class="ipc-inline-list__item"]/span/text()").get(), ^ SyntaxError: invalid syntax
Also, I have 2 errors on VSC regarding { and ( not being closed but I think that's because my code isn't running.
Thank you in advance!
Solution
The problem is that you have double quotes in your XPath and then again you are using double quotes to surround the entire XPath.
Python interpreter and your VSCode linter can't figure out where your string is beginning and where it is ending.
If your XPath has "
the use '
to surround the entire XPath, or the other way round.
Change this from:
'year': response.xpath("//li[@class="ipc-inline-list__item"]/span/text()").get(),
to:
'year': response.xpath('//li[@class="ipc-inline-list__item"]/span/text()').get(),
Here is your entire parse_item fixed:
def parse_item(self, response):
yield {
'title': response.xpath("//h1/text()").get(),
'year': response.xpath('//li[@class="ipc-inline-list__item"]/span/text()').get(),
'duration': response.xpath('(//li[@class="ipc-inline-list__item"])[3]/text()').get(),
'genre': response.xpath('//span[@class="ipc-chip__text"]/text()').get(),
'rating': response.xpath('//span[@class="AggregateRatingButton__RatingScore-sc-1ll29m0-1 iTLWoV"]/text()').get(),
'movie_url': response.url,
}
Answered By - Upendra
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.