Issue
I am trying to locate the url that sends the data to a webpage.
I tried scrapping https://fortune.com/ranking/global500/2023/search/ using Bs4 but found that because the table isn't populated using data in the html, Bs4 would not suffice (courtesy of this previous post (Scraping of fortune 500 company list for 2021 using Python).
I tried looking for the data url using the devtools and had no luck. Instructions on how to do this would be appreciated.
Solution
The data you see on the page is stored in <script>
element in Json form. To load it to pandas dataframe you can use next example:
import json
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://fortune.com/ranking/global500/2023/search/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#__NEXT_DATA__")
data = json.loads(data.text)
items = data["props"]["pageProps"]["franchiseList"]["items"]
df = pd.DataFrame([{**i.pop("data"), **i} for i in items])
print(df)
Prints:
Newcomer to the Global 500 Dropped in Rank Gained in Rank Sector Industry Country / Territory Headquarters City Headquarters State Profitable World's Most Admired Companies Female CEO Growth in Jobs Change the World Fastest Growing Companies Fortune 500 Best Companies Non-U.S. Companies Rank Revenues ($M) Revenue Percent Change Profits ($M) Profits Percent Change Assets ($M) Employees Change in Rank Years on Global 500 List name rank slug
0 no no no Retailing General Merchandisers U.S. Bentonville Arkansas yes yes no no yes no yes no no 1 $611,289 6.7% $11,680 -14.6% $243,197 2,100,000 29 Walmart 1 /company/walmart/global500
1 no no yes Energy Mining, Crude-Oil Production Saudi Arabia Dhahran yes no no yes no no no no yes 2 $603,651 50.8% $159,069 51% $663,541 70,496 4 5 Saudi Aramco 2 /company/saudi-aramco/global500
2 no no no Energy Utilities China Beijing yes no no no no no no no yes 3 $530,009 15.1% $8,192 14.8% $710,763 870,287 23 State Grid 3 /company/state-grid/global500
3 no yes no Retailing Internet Services and Retailing U.S. Seattle Washington no yes no no no no yes no no 4 $513,983 9.4% $-2,722 -108.2% $462,675 1,541,000 -2 15 Amazon 4 /company/amazon-com/global500
4 no yes no Energy Petroleum Refining China Beijing yes no no no no no no no yes 5 $483,019 17.3% $21,080 118.7% $637,223 1,087,049 -1 23 China National Petroleum 5 /company/china-national-petroleum/global500
5 no yes no Energy Petroleum Refining China Beijing yes no no no no no no no yes 6 $471,154 17.4% $9,657 16.1% $368,751 527,487 -1 25 Sinopec Group 6 /company/sinopec-group/global500
...
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.