Issue
As mentioned in the title, my spider goes to page2 and comes back to page1. So the order is 1-2-1. I have no idea where to look.
def parse(self, response):
for products in response.css("div.z7ntrt-0.cLlfW.s1a29zcm-11.ggOMjb"):
yield {
"name": products.css("a.link-detail::attr(title)").get(),
"link": products.css("a.link-detail").attrib["href"],
"source": products.css("div.tag::text").get()
}
next_page = response.css("a.s1pk8cwy-4.eSWEIV::attr(href)").get()
if next_page is not None:
next_page_link=response.urljoin(next_page)
yield scrapy.Request(url=next_page_link, callback= self.parse)
Solution
This is happening because the class name you are using is used as both the forward arrow and the back arrow in the pagination section of the webpage.
However, the current page element seems to have a unique value for it's
class attribute and it is consistently the same on each page, so using Xpath we can isolate the current page and then use the following-sibling
directive to get the page immediately after the current page.
I have tested this and can confirm that it does work as expected.
def parse(self, response):
for products in response.css("div.z7ntrt-0.cLlfW.s1a29zcm-11.ggOMjb"):
yield {
"name": products.css("a.link-detail::attr(title)").get(),
"link": products.css("a.link-detail").attrib["href"],
"source": products.css("div.tag::text").get()
}
next_page = response.xpath(
'//a[contains(@class,"ctiKSh")]/following-sibling::a/@href'
).get()
if next_page is not None:
next_page_link=response.urljoin(next_page)
yield scrapy.Request(url=next_page_link, callback= self.parse)
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.