Issue
I'm using bs4
to attempt at printing out the following structure:
Talent Build Name Staggering Blow Build
Category Recommended
Level 1 On The Prowl
Level 4 Hogger's Jogger's
Level 7 Seeing Red
Level 10 Shockwave
Level 13 Pummel
Level 16 Headbanger
Level 20 No Control
Talent Build Name Ez-Thro Dynamite Build
Category Situational
Level 1 ...
Specs below: VS Code Win 10 Python 3.12.1 BS4 Version: 4.12.3 requests Version: 2.31.0
Scraped Website: https://www.icy-veins.com/heroes/hogger-talents
bs4 resource: https://blog.logrocket.com/build-python-web-scraper-beautiful-soup/
Python below:
from bs4 import BeautifulSoup
import requests
#heroname = input("Enter hero name:")
def fetch_talent_html():
# make a request to the target website
r = requests.get("https://www.icy-veins.com/heroes/hogger-talents")
if r.status_code == 200:
# if the request is successful return the HTML content
return r.text
else:
# throw an exception if an error occurred
raise Exception("an error occurred while fetching icyveins html")
def extract_talents_info(html):
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(html, 'html.parser')
# talentdiv = soup.find(class_='heroes_builds')
# print(talentdiv.prettify())
heroes_builds_collection = soup.find(class_='heroes_builds')
heroes_builds = heroes_builds_collection.find_all("heroes_build")[1:]
print(heroes_builds)
# iterate through our builds
builds = []
for builds in heroes_builds:
talent_collection = builds.find("div", {"class": "heroes_build_talents"})
return builds
I ran the code in VSCode and was expecting to see the list of talent builds, levels, and abilities.
When I modify the heroes_builds variable I get an error later on saying there is no h3 tag, so I have a feeling there's love there. I'm just not there yet. Any insight is appreciated!
Solution
There are a few of different issues, so just focus on what should matter - Try to iterate the HTML tree like you would read the page as human and pick the information needed by using the correct selectors:
def extract_talents_info(html):
soup = BeautifulSoup(html, 'html.parser')
builds = []
for b in soup.select('.heroes_build'):
builds.append({
'build_name': b.h3.get_text(),
'category': b.span.text.strip(),
'talents': [
{
'level':t.span.get_text(),
'ability': t.img.get('alt')
}
for t in b.select('.heroes_build_talent_tier')
]
})
return builds
Results into:
[{'build_name': 'Staggering Blow Build',
'category': 'Recommended',
'talents': [{'level': 'Level 1', 'ability': 'On The Prowl Icon'},
{'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
{'level': 'Level 7', 'ability': 'Seeing Red Icon'},
{'level': 'Level 10', 'ability': 'Shockwave Icon'},
{'level': 'Level 13', 'ability': 'Pummel Icon'},
{'level': 'Level 16', 'ability': 'Headbanger Icon'},
{'level': 'Level 20', 'ability': 'No Control Icon'}]},
{'build_name': 'Ez-Thro Dynamite Build',
'category': 'Situational',
'talents': [{'level': 'Level 1', 'ability': 'Journeyman Cooking Icon'},
{'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
{'level': 'Level 7', 'ability': 'Dense Blasting Powder Icon'},
{'level': 'Level 10', 'ability': 'Shockwave Icon'},
{'level': 'Level 13', 'ability': 'Pummel Icon'},
{'level': 'Level 16', 'ability': 'Kablooie! Icon'},
{'level': 'Level 20', 'ability': 'No Control Icon'}]},
{'build_name': 'ARAM Build',
'category': 'ARAM',
'talents': [{'level': 'Level 1', 'ability': 'Journeyman Cooking Icon'},
{'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
{'level': 'Level 7', 'ability': 'Dense Blasting Powder Icon'},
{'level': 'Level 10', 'ability': 'Shockwave Icon'},
{'level': 'Level 13', 'ability': 'Pummel Icon'},
{'level': 'Level 16', 'ability': 'Kablooie! Icon'},
{'level': 'Level 20', 'ability': 'No Control Icon'}]}]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.