Issue
I am writing a scrapy-splash program and I need to click on the display button on the webpage, as seen in the image below, in order to display the data, for 10th edition, so I can scrape it. I have the code I tried below but it does not work. The information I need is only accessible if I click the display button. UPDATE: Still struggling with this and I have to believe there is a way to do this. I do not want to scrape the JSON because that could be a red flag to site owners.
import scrapy
from ..items import NameItem
class LoginSpider(scrapy.Spider):
name = "LoginSpider"
start_urls = ["http://www.starcitygames.com/buylist/"]
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formcss='#existing_users form',
formdata={'ex_usr_email': '[email protected]', 'ex_usr_pass': 'password123'},
callback=self.after_login
)
def after_login(self, response):
item = NameItem()
display_button= response.xpath('//a[contains(., "- Display>>")]/@href').get()
response.follow(display_button, self.parse)
item["Name"] = response.css("div.bl-result-title::text").get()
return item
Solution
Your code can't work because there is no anchor element and no href attribute. Clicking the button will send an XMLHttpRequest
to http://www.starcitygames.com/buylist/search?search-type=category&id=5061
and the data you want is found in the JSON response.
- To check the request URL and response, open Dev Tools -> Network -> XHR and click
Display
. - In
Headers
tab you will find the request URL and inPreview
orResponse
tabs you can inspect the JSON. - As you can see you'll need a category
id
to build the request URL. You can find this by parsing thescript
element found with this XPath//script[contains(., "categories")]
- Then you can send your request from the spider to
http://www.starcitygames.com/buylist/search?search-type=category&id=5061
and get the data you want.
$ curl 'http://www.starcitygames.com/buylist/search?search-type=category&id=5061'
{"ok":true,"search":"10th Edition","results":[[{"id":"46269","name":"Abundance","subtitle":null,"condition":"NM\/M","foil":true,"is_parent":false,"language":"English","price":"20.000","rarity":"Rare","image":"cardscans\/MTG\/10E\/en\/foil\/Abundance.jpg"},{"id":"176986","name":"Abundance","subtitle":null,"condition":"PL","foil":true,"is_parent":false,"language":"English","price":"12.000","rarity":"Rare","image":"cardscans\/MTG\/10E\/en\/foil\/Abundance.jpg"}....
As you can see, you don't even need to log in into the website or Splash
.
Answered By - Ionut-Cezar Ciubotariu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.