Issue
I am trying to scrape the size menu of this page. https://www.hayabusafightwear.co.uk/hayabusa-lightweight-jiu-jitsu-gi-blue If I copy the html and use beautiful soup it works but doesn't seem to work on the live version. I think it is because the html is dynamic?
What is the best way to go forward? Is it even worth trying?
Thank you very much for your help.
url="https://www.hayabusafightwear.co.uk/hayabusa-lightweight-jiu-jitsu-gi-blue"
page_html = get_page_html(url)
soup = BeautifulSoup(page_html, 'html.parser')
attrs = soup.find("select", {"class":"required-entry super-attribute-select"}).find_all("option")
print(attrs)
Solution
You should use the combination of selenium and bs4 to get it done:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome(chrome_options = options)
url = 'https://www.hayabusafightwear.co.uk/hayabusa-lightweight-jiu-jitsu-gi-blue'
data = driver.get(url)
time.sleep(1)
pg_html = driver.page_source
soup = BeautifulSoup(pg_html, 'html.parser')
selection = soup.find("select", attrs = {"class":"required-entry super-attribute-select"})
sizes = selection.find_all('option')
for size in sizes:
print(size.text)
'''
R e s u l t :
Choose an Option...
A1
A4
'''
You can find more about it here: https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25
Regards...
Answered By - d r
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.