Issue
i try to get a text from a website using the following code:
import time
import os
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
if __name__ == '__main__':
os.environ['WDM_LOG'] = '0'
options = Options()
# options.add_argument('--headless=new')
options.add_argument("start-maximized")
options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
srv=Service()
driver = webdriver.Chrome (service=srv, options=options)
wLink = "https://www.medimops.de/agatha-christie-agatha-christie-ein-schritt-ins-leere-why-didn-t-they-ask-evans-der-komplette-vierteiler-mit-starbesetzung-blu-ray-blu-ray-M0B0BW28MKKR.html"
driver.get (wLink)
time.sleep(3)
soup = BeautifulSoup (driver.page_source, 'lxml')
wTitle = soup.find("div", {"class": "detail-page__title"}).text.strip()
worker = soup.find("dl", {"class": "product-attributes__table"})
worker = worker.find("template")
wDT = worker.find("dt")
print(wDT)
print(wDT.text)
print(list(wDT.stripped_strings))
driver.quit()
I get this as output for the print-statements:
<dt class="product-attributes__definition">EAN / ISBN-<!-- -->:</dt>
[]
Why is the text ("EAN / ISBN") from the dt-tag not outputted?
Solution
If it is enough for you to get the text before the comment, then use access to child elements through contents
.
print(wDT.text) ---- print(wDT.contents[0])
If a tag has only one child, and that child is a NavigableString, the child is made available as .string:
Because of the comment .string
and does not work.
Answered By - Сергей Кох
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.