Issue
I'm trying to use scrapy to extract data from Steam about users' top 10 played games ordered by play time. However, I am unable to output the name of each of the games because the css classes containing the name text have trailing spaces.
I'm new to both Python and the Scrapy library, so apologies for any mistakes/poor formatting.
The class and python code are as follows:
Exact class code
<div class="gameListRowItemName ellipsis ">Counter-Strike: Global Offensive</div> == $0
Scrapy parser code
def parse(self, response):
some other code...
return {
some other code...
'gamename': response.css("div.gameListRowItemName.ellipsis ::text").extract()
}
I have made sure to include the ".ellipsis" to account for this being a multi-class css definition, however I can't find what a trailing space in the css class means.
I've attempted using multiple different variations on "div.gameListRowItemName.ellipsis ::text" to try and access this text (such as ".gameListRowItemName ::text"), but the spider only ever returns a blank list.
I don't think there is an issue anywhere else in the spider affecting my output, as the spider also returns the username which works correctly.
Does anyone know how I can work around this issue?
Solution
If you are using css selector you can simply pass the first class name.
from scrapy.selector import Selector
response = Selector(text='<div class="gameListRowItemName ellipsis ">Counter-Strike: Global Offensive</div> == $0')
# with css selectors
print('Css:',response.css("div.gameListRowItemName::text").extract())
# with xpath selectors
print('Xpath:',response.xpath('//*[contains(@class,"gameListRowItemName")]/text()').extract())
Output
Css: ['Counter-Strike: Global Offensive']
Xpath: ['Counter-Strike: Global Offensive']
Learn about css and xpath selectors xpath on w3schools.
Answered By - Amit
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.