Issue
I am using python scrapy to get the user Reviews comments, which may have multiple pages and I need to click "see more " so as to see more comments.
this is the link to the page I want to crawl: https://en.drivy.com/car-rental/berlin/dacia-dokker-218119
I notice if the review comments more than 10, I need to click "See more" link in order to get the subsequent comments. I also notice the "see more" URL link is https: //en.drivy.com/cars/218119/reviews?page=2&rel=next
However, if I use scrapy to go to the https: //en.drivy.com/cars/218119/reviews?page=2&rel=next, the website redirects me back to https:// en.drivy.com/car-rental/berlin/dacia-dokker-218119 which i cant really get the next ten comments. (i wonder if the website use cookie or session ID and identify my scrapy as new visit)
I know I can use python selenium to open the web page and click "see more" so as to get the comments, however, selenium is very slow and I wish I can use scrapy instead.
Could anyone help me on this? or at least give me a direction to proceed? Thanks in advance.
Solution
You should set "Accept: */*;q=0.5, text/javascript, application/javascript, application/ecmascript, application/x-ecmascript"
header. You'll catch JS object containing texts of comments.
yield Request("https://en.drivy.com/cars/218119/reviews?page=2&rel=next", parse = ..., ...,
headers={'Accept': "*/*;q=0.5, text/javascript, application/javascript, application/ecmascript, application/x-ecmascript"})
Answered By - Roman Mindlin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.