Issue
soup = BeautifulSoup(browser.page_source, "html.parser")
for h1 in soup.find_all('h2'):
try:
array.append("https://www.chamberofcommerce.com" + h1.find("a")['href'])
print("https://www.chamberofcommerce.com" + h1.find("a")['href'])
except:
pass
input=browser.find_element_by_xpath('//a[@class="next"]')
while input:
input.click()
time.sleep(10)
soup = BeautifulSoup(browser.page_source, "html.parser")
for h1 in soup.find_all('h2'):
try:
array.append("https://www.chamberofcommerce.com" + h1.find("a")['href'])
print("https://www.chamberofcommerce.com" + h1.find("a")['href'])
except:
pass
This part of the code scraps urls of the listings on yellopages, the code worked fine until I used to scrap urls from only the first page of the search, Now I want it to click on next button until the pages of search finishes, Foe Example If there are 20 pages of search then the selenuim bot should click on next button and scraps url until it has reached the 20th page,
Please see the logic of the code, and also I am getting the following error after the bot reaches to the page 2 where as actual number of pages is 15 and it crashes on page 2:
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
Solution
while input
is not what you need... Note that once you click Next button new page is loaded and all WebElements from previous page are no more valid: you have to re-define them on each page. Try below approach:
while True:
try:
browser.find_element_by_xpath('//a[@class="next"]').click()
except:
break
With above code you should be able to click Next button on each page while it is available. You might also need to apply ExplicitWait to wait for Next button to be clickable:
wait.until(EC.element_to_be_clickable((By.XPATH, '//a[@class="next"]'))).click()
Answered By - Andersson
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.