Issue
Want to iterate all pages from this url ""url = "https://www.iata.org/en/about/members/airline-list/"" and dump the results in a .csv file.
How could implementing a piece of code to iterate through the pages be included in the current code below?
import requests
import pandas as pd
from bs4 import BeautifulSoup
from urllib.request import Request
url = 'https://www.iata.org/en/about/members/airline-list/'
req = Request(url , headers = {
'accept':'*/*',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'})
data = []
while True:
print(url)
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
data.append(pd.read_html(soup.select_one('table.datatable').prettify())[0])
if soup.select_one('span.pagination-link.is-active + div a[href]'):
url = soup.select_one('span.pagination-link.is-active + div a')['href']
else:
break
df = pd.concat(data)
df.to_csv('airline-list.csv',encoding='utf-8-sig',index=False)
Solution
Try this approach:
for i in range(1, 30):
url = f'https://www.iata.org/en/about/members/airline-list/?page={i}&search=&ordering=Alphabetical'
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
data.append(pd.read_html(soup.select_one('table.datatable').prettify())[0])
Answered By - Virtuoz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.