Issue
I want to scrape daily top 200 songs from Spotify charts website. I am trying to parse html code of page and trying to get song's artist, name and stream informations. But following code returns nothing. How can I get these informations with the following way?
for a in soup.find("div",{"class":"Container-c1ixcy-0 krZEp encore-base-set"}):
for b in a.findAll("main",{"class":"Main-tbtyrr-0 flXzSu"}):
for c in b.findAll("div",{"class":"Content-sc-1n5ckz4-0 jyvkLv"}):
for d in c.findAll("div",{"class":"TableContainer__Container-sc-86p3fa-0 fRKUEz"}):
print(d)
And let say this is the songs list that I want to scrape from it. https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14
And also this is the html code of the page.
Solution
In the example link you provided, there aren't 200 songs, but only 50. The following is one way to get those songs:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14'
browser.get(url)
wait = WebDriverWait(browser, 5)
try:
wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
print("accepted cookies")
except Exception as e:
print('no cookie button')
header_to_be_removed = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'header[data-testid="charts-header"]')))
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", header_to_be_removed)
while True:
try:
show_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@data-testid="load-more-entries"]//button')))
show_more_button.location_once_scrolled_into_view
t.sleep(5)
show_more_button.click()
print('clicked to show more')
t.sleep(3)
except TimeoutException:
print('all done')
break
songs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[data-testid="charts-entry-item"]')))
print('we have', len(songs), 'songs')
song_list = []
for song in songs:
song.location_once_scrolled_into_view
t.sleep(1)
title = song.find_element(By.CSS_SELECTOR, 'p[class^="Type__TypeElement-"]')
artist = song.find_element(By.CSS_SELECTOR, 'span[data-testid="artists-names"]')
song_list.append((artist.text, title.text))
df = pd.DataFrame(song_list, columns = ['Title', 'Artist'])
print(df)
This will print out in terminal:
no cookie button
clicked to show more
clicked to show more
clicked to show more
clicked to show more
all done
we have 50 songs
Title | Artist | |
---|---|---|
0 | Bizarrap, | Quevedo: Bzrp Music Sessions, Vol. 52 |
1 | Harry Styles | As It Was |
2 | Bad Bunny, | Me Porto Bonito |
3 | Bad Bunny | Tití Me Preguntó |
4 | Manuel Turizo | La Bachata |
5 | ROSALÍA | DESPECHÁ |
6 | BLACKPINK | Pink Venom |
7 | David Guetta, | I'm Good (Blue) |
8 | OneRepublic | I Ain't Worried |
9 | Bad Bunny | Efecto |
10 | Chris Brown | Under The Influence |
11 | Steve Lacy | Bad Habit |
12 | Bad Bunny, | Ojitos Lindos |
13 | Kate Bush | Running Up That Hill (A Deal With God) - 2018 Remaster |
14 | Joji | Glimpse of Us |
15 | Nicki Minaj | Super Freaky Girl |
16 | Bad Bunny | Moscow Mule |
17 | Rosa Linn | SNAP |
18 | Glass Animals | Heat Waves |
19 | KAROL G | PROVENZA |
20 | Charlie Puth, | Left and Right (Feat. Jung Kook of BTS) |
21 | Harry Styles | Late Night Talking |
22 | The Kid LAROI, | STAY (with Justin Bieber) |
23 | Tom Odell | Another Love |
24 | Central Cee | Doja |
25 | Stephen Sanchez | Until I Found You |
26 | Bad Bunny | Neverita |
27 | Post Malone, | I Like You (A Happier Song) (with Doja Cat) |
28 | Lizzo | About Damn Time |
29 | Nicky Youre, | Sunroof |
30 | Elton John, | Hold Me Closer |
31 | Luar La L | Caile |
32 | KAROL G, | GATÚBELA |
33 | The Weeknd | Die For You |
34 | Bad Bunny, | Tarot |
35 | James Hype, | Ferrari |
36 | Imagine Dragons | Bones |
37 | Elton John, | Cold Heart - PNAU Remix |
38 | The Neighbourhood | Sweater Weather |
39 | Ghost | Mary On A Cross |
40 | Shakira, | Te Felicito |
41 | Justin Bieber | Ghost |
42 | Bad Bunny, | Party |
43 | Drake, | Jimmy Cooks (feat. 21 Savage) |
44 | Doja Cat | Vegas (From the Original Motion Picture Soundtrack ELVIS) |
45 | Camila Cabello, | Bam Bam (feat. Ed Sheeran) |
46 | Rauw Alejandro, | LOKERA |
47 | Rels B | cómo dormiste? |
48 | The Weeknd | Blinding Lights |
49 | Arctic Monkeys | 505 |
Of course you can get other info like chart ranking, all artists when there are more than one, etc.
Selenium chrome/chromedriver setup is for Linux, you just have to observe the imports and code after defining the browser, to adapt it to your own setup.
Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html
For selenium docs, visit: https://www.selenium.dev/documentation/
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.