Issue
I am trying to websrape text from a web page using Python and beutifulsoup library. The text that I am trying to get is contained in a tag that is visible in the browser's inspector (developer tools) as
<h2 data-v-a4f7566c="" class="name">Racio lky sýrové</h2>
The wabpage: https://www.kosik.cz/c1026-pekarna-a-cukrarna
This is the reason why
soup = BeautifulSoup(response.text, "html.parser")
def get_product_names(soup):
product_names = []
for h2 in soup.find_all("h2"):
product_names.append(h2.text)
return product_names
product_names = get_product_names(soup)
returns "none".
However, when I take a look at the page source, I can see no tag there.
This bring me to the conclusion that the tag is being generated by javascript.
Question: Is there a way how one can websrap dynamicly generated contant using beutifulsoup?
Solution
I can't reproduce you're problem, worked for me with :
import requests
from bs4 import BeautifulSoup
import html5lib
import json
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
def get_product_names(soup):
product_names = []
for h2 in soup.find_all("h2"): # or find_all("h2", {"class":"name"})
product_names.append(h2.text)
return product_names
URL = "https://www.kosik.cz/c1026-pekarna-a-cukrarna"
service = Service()
options = Options()
options.add_argument("--disable-extensions")
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(service=service,options=options)
driver.get(URL)
soup = BeautifulSoup(driver.page_source, "html.parser")
product_names = get_product_names(soup)
I got the following list in product_names:
['Racio lky sýrové', 'Racio Cornies Kukuřičné chlebíčky se lněným semínkem', 'Biopekárna Zemanka BIO Dýňové krekry se špaldou a česnekem', 'Bauli Croisaant Bauli vanilkový', 'PAC Hořovice Loupák sypaný (2ks) ', 'Hradecká pekárna Low carb Proteinové pečivo, 2x65g', 'Racio Knäckebrot žitný s vysokým obsahem vlákniny', 'Nový Věk Kukuřičné lupínky - Mořská sůl', 'Nový Věk Rýžové chlebíčky kakaové', 'Racio Free Style Rýžové chlebíčky s příchutí rajče a bazalka', 'Merhautovo pekařství Veka krájená balená', 'Nutrifree Bezlepkový Domácí krájený chléb (4x75g)', 'Bauli Croissant Bauli čokoládový', 'Biopekárna Zemanka BIO Medové perníčky', 'Biopekárna Zemanka BIO Bezlepkové pohankové perníčky s medem', 'Nutrifree Bezlepkové Hamburger housky 180g']
Answered By - Saint-malo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.