Issue
I would scraper urls of player of all pages from this website https://www.transfermarkt.it/detailsuche/spielerdetail/suche/27564780 but I can scrape only the first one, why? I write a cicle for with range()
import pandas as pd
import requests
from bs4 import BeautifulSoup
list_url=[]
def get_player_urls(page):
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
link = 'https://www.transfermarkt.it/detailsuche/spielerdetail/suche/27564780/page/{page}'
content = requests.get(link, headers=headers)
soup = BeautifulSoup(content.text, 'html.parser')
for urls in soup.find_all('a', class_='spielprofil_tooltip'):
url = 'https://www.transfermarkt.it' + urls.get('href')
print(url)
list_url.append(url)
return
for page in range(1,11,1):
get_player_urls(page)
df_url = pd.DataFrame(list_url)
df_url.to_csv('df_url.csv', index=False, header=False)
Solution
You're not actually imputing the page into the url. Also, no need to put return on your function. You aren't returning anything:
import pandas as pd
import requests
from bs4 import BeautifulSoup
list_url=[]
def get_player_urls(page):
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
link = 'https://www.transfermarkt.it/detailsuche/spielerdetail/suche/27564780/page/{page}'.format(page=page) #<-- Add this
content = requests.get(link, headers=headers)
soup = BeautifulSoup(content.text, 'html.parser')
for urls in soup.find_all('a', class_='spielprofil_tooltip'):
url = 'https://www.transfermarkt.it' + urls.get('href')
print(url)
list_url.append(url)
for page in range(1,11,1):
get_player_urls(page)
df_url = pd.DataFrame(list_url)
df_url.to_csv('df_url.csv', index=False, header=False)
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.