Issue
I am working with web scraping sales navigator. I was able to navigate to 1st page, scroll 8 times and extract all the names, titles using selenium and beautiful. below is the code.
driver.get(dm)
time.sleep(5)
time.sleep(5)
section = driver.find_element(By.XPATH, "//*[@id='search-results-container']")
time.sleep(5)
counter = 0
while counter < 8: # this will scroll 8 times
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;',
section)
counter += 1
# add a timer for the data to fully load once you have scrolled the section
time.sleep(7) # You might need to install time library to use this statement
src2 = driver.page_source
# Now using beautiful soup
soup = BeautifulSoup(src2, 'lxml')
name_soup = soup.find_all('span', {'data-anonymize': 'person-name'})
names = []
for name in name_soup:
names.append(name.text.strip())
However, there are 8 more pages and I need to extract all the names and append it to names list.
Please help
Solution
Generally, the logic I use for pagination is
while True:
## PAGE SCRAPING CODE [ie, your current code]
## SEARCH FOR NEXT PAGE [button/link]
### IF NEXT PAGE --> click button or go to link
### NO NEXT PAGE --> BREAK
If you included the link you're trying to scrape, I might be able to give you a more specific answer. For example, this is a function I often use to scrape paginated data, although it's not meant to be for scrollable pages....
Answered By - Driftr95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.