Issue
I am trying to write an XPath expression which can return the URL associated with the next page of a search.
The URL which leads to the next page of the search is always the href
in the a
tag following the tag span class="navCurrentPage"
I have been trying to use a following-sibling
term to pull the next URL. My search in the Chrome console is:
$x('//span[@class="navCurrentPage"][1]/following-sibling::a/@href[1]')
I thought by specifying @href[1]
I would only get back one URL (thinking the [1] chooses the first element in list), but instead Chrome (and Scrapy) are returning four URLs. I don't understand why. Please help me to understand how to select the one URL that I am looking for.
Here is the URL where you can find the HTML giving me trouble:
Thank you for the help.
Solution
Operator precedence: //x[1]
means /descendant-or-self::node()/child::x[1]
which finds every descendant x
that is the first child of its parent. You want (//x)[1]
which finds the first node among all the descendants named x
.
Answered By - Michael Kay
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.