Issue
I am scraping this site: https://www.oddsportal.com/darts/europe/european-championship/results/
this site uses javascript to render table data, hence I have used scrapy-splash plugin in a docker container.
I want to filter out all rows with class 'dark center' while iterating over the selector list 'tableRows'. however when iterating it appears the xpath selector queries the entire SelectorList opposed to each item at each iteration
tableRows = response.xpath('//table[contains(@id, "tournamentTable")]/tbody/tr')
for row in tableRows:
print(row)
if row.xpath('//*[contains(@class, "dark center")]') is not None:
print(True)
My output:
<Selector xpath='//table[contains(@id, "tournamentTable")]/tbody/tr' data='<tr class="dark center" xtid="39903"><th'>
True
<Selector xpath='//table[contains(@id, "tournamentTable")]/tbody/tr' data='<tr class="center nob-border"><th class='>
True
Why is the class 'center nob-border' returning True?
Solution
You have a bit wrong XPath. Take a look at this answer. You've missed dot in second XPath expression. In short:
# Search document root for mentioned node.
row.xpath('//*[contains(@class, "dark center")]')
# In fact it's the same as
response.xpath('//*[contains(@class, "dark center")]')
# Search element root for mentioned node(what you're really need) is
row.xpath('./*[contains(@class, "dark center")]')
# or .//*[contains(@class, "dark center")] possibly, depending on DOM structure
Large update here.. Ahaha... in fact in was really dumb of me. Well... You had two mistakes in your code actually. First one was Xpath expression that I've mentioned. And the second one is the comparison operator.
row.xpath('any XPath here') is not None
Will always return True. Since the function return type is a list, it can be empty but it never can be NoneType. So it goes. I've also improved the Xpath selector... Finally, a totally accurate code you need is:
tableRows = response.xpath('//table[contains(@id, "tournamentTable")]/tbody/tr')
for row in tableRows:
print(row)
if row.xpath('./self::tr[contains(@class, "dark center")]'):
print(True)
Answered By - Michael Savchenko
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.