Issue
I am dealing with pagination here. How can I get the href value from the below HTML selector? I can't use //a[@data-page-number ='2']/@href because the 2 changes to 3 when after every page.
<a data-page-number="2" data-offset="30" href="/Restaurants-g297633-oa30-Kochi_Cochin_Ernakulam_District_Kerala.html#EATERY_LIST_CONTENTS" class="nav next rndBtn ui_button primary taLnk" onclick=" require('common/Radio')('restaurant-filters').emit('paginate', this.getAttribute('data-offset'));; ta.trackEventOnPage('STANDARD_PAGINATION', 'next', '2', 0); return false;
">
Next
</a>
Solution
You want to get the href
attribute of next
button.
As you can see it has next
value inside onclick
attribute so we can use this to filter all the other a
tags.
Example with Scrapy shell:
In [1]: url='https://www.tripadvisor.in/Restaurants-g297633-Kochi_Cochin_Ernakulam_District_Kerala.html#EATERY_LIST_CON
...: TENTS'
In [2]: req = scrapy.Request(url=url)
In [3]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.tripadvisor.in/Restaurants-g297633-Kochi_Cochin_Ernakulam_District_Kerala.html#EATERY_LIST_CONTENTS> (referer: None)
In [4]: response.xpath('//a[contains(@onclick, "next")]/@href').get()
Out[4]: '/Restaurants-g297633-oa30-Kochi_Cochin_Ernakulam_District_Kerala.html#EATERY_LIST_CONTENTS'
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.