Issue
Hi All I have written a python program to retrieve the title of a page it works fine but with some pages, it also receives some unwanted text how to avoid that
here is my program
# importing the modules
import requests
from bs4 import BeautifulSoup
# target url
url = 'https://atlasobscura.com'
# making requests instance
reqs = requests.get(url)
# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')
# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
title_data = title.get_text().lower().strip()
print(title_data)
here is my output
atlas obscura - curious and wondrous travel destinations
aoc-full-screen
aoc-heart-solid
aoc-compass
aoc-flipboard
aoc-globe
aoc-pocket
aoc-share
aoc-cancel
aoc-video
aoc-building
aoc-clock
aoc-clipboard
aoc-help
aoc-arrow-right
aoc-arrow-left
aoc-ticket
aoc-place-entry
aoc-facebook
aoc-instagram
aoc-reddit
aoc-rss
aoc-twitter
aoc-accommodation
aoc-activity-level
aoc-add-a-photo
aoc-add-box
aoc-add-shape
aoc-arrow-forward
aoc-been-here
aoc-chat-bubbles
aoc-close
aoc-expand-more
aoc-expand-less
aoc-forum-flag
aoc-group-size
aoc-heart-outline
aoc-heart-solid
aoc-home
aoc-important
aoc-knife-fork
aoc-library-books
aoc-link
aoc-list-circle-bullets
aoc-list
aoc-location-add
aoc-location
aoc-mail
aoc-map
aoc-menu
aoc-more-horizontal
aoc-my-location
aoc-near-me
aoc-notifications-alert
aoc-notifications-mentions
aoc-notifications-muted
aoc-notifications-tracking
aoc-open-in-new
aoc-pencil
aoc-person
aoc-pinned
aoc-plane-takeoff
aoc-plane
aoc-print
aoc-reply
aoc-search
aoc-shuffle
aoc-star
aoc-subject
aoc-trip-style
aoc-unpinned
aoc-send
aoc-phone
aoc-apps
aoc-lock
aoc-verified
instead of this I suppose to receive only this line
"atlas obscura - curious and wondrous travel destinations"
please help me with some idea all other websites are working only some websites gives these problem
Solution
Your problem is that you're finding all the occurences of "title" in the page. Beautiful soup has an attribute title
specifically for what you're trying to do. Here's your modified code:
# importing the modules
import requests
from bs4 import BeautifulSoup
# target url
url = 'https://atlasobscura.com'
# making requests instance
reqs = requests.get(url)
# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')
title_data = soup.title.text.lower()
# displaying the title
print("Title of the website is : ")
print(title_data)
Answered By - ratchek
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.