Issue
I am trying to read in in Python this file
https://www.europarl.europa.eu/meps/en/full-list/xml/a
And I have used this code
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.europarl.europa.eu/meps/en/full-list/xml/a'
soup = bs(requests.get(url, headers=headers).text, 'lxml')
df = pd.read_xml(str(soup))
print(df)
But, the result looks wrong.
meps
0 NaN
Can anyone help me please?
Solution
No need to use intermediate libraries, read_xml
can handle a URL:
df = pd.read_xml('https://www.europarl.europa.eu/meps/en/full-list/xml/a')
If you need to pass custom header, use storage_options
:
url = 'https://www.europarl.europa.eu/meps/en/full-list/xml/a'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
df = pd.read_xml(url, storage_options=headers)
Output:
fullName country politicalGroup id nationalPoliticalGroup
0 Magdalena ADAMOWICZ Poland Group of the European People's Party (Christia... 197490 Independent
1 Asim ADEMOV Bulgaria Group of the European People's Party (Christia... 189525 Citizens for European Development of Bulgaria
2 Isabella ADINOLFI Italy Group of the European People's Party (Christia... 124831 Forza Italia
3 Matteo ADINOLFI Italy Identity and Democracy Group 197826 Lega
4 Alex AGIUS SALIBA Malta Group of the Progressive Alliance of Socialist... 197403 Partit Laburista
...
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.