Issue
for education purposes I want to scrap the population of Germany per pin code. This information is available at https://postal-codes.cybo.com/germany/
I tried to run web scraping tools such as requests and some others which I found in stackoverflow.
import requests
from bs4 import BeautifulSoup
headers= {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://postal-codes.cybo.com/germany'
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.title.text)
print(r.status_code)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.title.text)
can someone guide me on how to approach this issue? Is there any other way I ca get the population of Germany per post /pin code.
Thank you in advance!!
https://postal-codes.cybo.com/germany/#listcodes
basically I want to scrap this data, postal code, city, population area - if we open individual postal code then we can also wee male/female polulation as well
Solution
The site's protected by Cloudflare but it's possible to bypass the challenge with the cloudscraper
module.
Here's how:
import cloudscraper
import pandas as pd
from bs4 import BeautifulSoup
from tabulate import tabulate
scraper = cloudscraper.create_scraper()
source_html = (
BeautifulSoup(
scraper.get("https://postal-codes.cybo.com/germany/?p=1").text,
"lxml",
).select("table.paleblue")[-1]
)
columns = [
"Postal Code", "City", "Administrative Region", "Population", "Area",
]
df = pd.concat(pd.read_html(str(source_html).replace("%", "")))
df = df.reindex(columns=columns)
df.drop(df.tail(1).index, inplace=True)
print(tabulate(df, headers="keys", tablefmt="psql", showindex=False))
df.to_csv("postal_codes_germany.csv", index=False)
This should print (and also dump the data from the first page to a .csv
file):
+---------------+--------------------+-------------------------+--------------+-----------+
| Postal Code | City | Administrative Region | Population | Area |
|---------------+--------------------+-------------------------+--------------+-----------|
| 01067 | Dresden | Saxony | 26664 | 6.7 km² |
| 01069 | Dresden | Saxony | 24682 | 5.3 km² |
| 01097 | Dresden | Saxony | 16468 | 3.417 km² |
| 01099 | Dresden | Saxony | 26147 | 52.8 km² |
| 01108 | — | Saxony | — | — |
| 01109 | Klotzsche | Saxony | 33677 | 27.2 km² |
| 01127 | Dresden | Saxony | 12998 | 2.827 km² |
| 01129 | Dresden | Saxony | 17585 | 8.3 km² |
| 01139 | Dresden | Saxony | 29363 | 8.5 km² |
| 01156 | Dresden | Saxony | — | — |
| 01157 | Dresden | Saxony | 24767 | 8.5 km² |
| 01159 | Dresden | Saxony | 27573 | 5.7 km² |
| 01169 | Dresden | Saxony | 15136 | 5.1 km² |
| 01187 | Dresden | Saxony | 14150 | 5 km² |
| 01189 | Dresden | Saxony | 12040 | 5.3 km² |
| 01217 | Dresden | Saxony | 12810 | 5.2 km² |
| 01219 | Dresden | Saxony | 20713 | 7.5 km² |
| 01237 | Dresden | Saxony | 17916 | 4.314 km² |
| 01239 | Dresden | Saxony | 11957 | 3.634 km² |
| 01257 | Dresden | Saxony | 24753 | 8.4 km² |
| 01259 | Dresden | Saxony | 18304 | 10.4 km² |
| 01277 | Dresden | Saxony | 20606 | 4.616 km² |
| 01279 | Dresden | Saxony | 12092 | 5.4 km² |
| 01307 | Dresden | Saxony | 14597 | 3.381 km² |
| 01309 | Dresden | Saxony | 18589 | 4.863 km² |
| 01324 | Dresden | Saxony | 7757 | 9.1 km² |
| 01326 | Dresden | Saxony | 10859 | 15.3 km² |
| 01328 | Dresden | Saxony | — | — |
| 01445 | Radebeul | Saxony | 32518 | 26 km² |
| 01454 | Wachau, Saxony | Saxony | 20523 | 59 km² |
| 01458 | Ottendorf-Okrilla | Saxony | 10242 | 36 km² |
| 01462 | Dresden | Saxony | 26184 | 31.1 km² |
| 01465 | Dresden | Saxony | 4274 | 12.1 km² |
| 01468 | Moritzburg, Saxony | Saxony | 10414 | 68 km² |
| 01471 | Radeburg | Saxony | 5126 | 33.1 km² |
| 01474 | Dresden | Saxony | 15710 | 40.9 km² |
| 01477 | Arnsdorf | Saxony | 4503 | 35.6 km² |
| 01478 | Dresden | Saxony | 9106 | 16.3 km² |
| 01558 | Großenhain | Saxony | 12856 | 37.9 km² |
| 01561 | Schönfeld, Saxony | Saxony | 21540 | 418.4 km² |
| 01587 | Riesa | Saxony | 11165 | 7.5 km² |
| 01589 | Riesa | Saxony | 7550 | 15.3 km² |
| 01591 | Riesa | Saxony | 11368 | 16.9 km² |
| 01594 | — | Saxony | 7355 | 86.5 km² |
| 01609 | Nauwalde | Saxony | 12121 | 85.1 km² |
| 01612 | Nünchritz | Saxony | 7115 | 45 km² |
| 01616 | Strehla | Saxony | 3888 | 30.1 km² |
| 01619 | Zeithain | Saxony | 5680 | 81.5 km² |
| 01623 | Lommatzsch | Saxony | 9148 | 139.8 km² |
| 01640 | Coswig, Saxony | Saxony | 21213 | 26.1 km² |
+---------------+--------------------+-------------------------+--------------+-----------+
Answered By - baduker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.