Issue
I'm trying to scrape the nbn plans from Tangerine website as a scraping practice. I'm using BeautifulSoup and I'm able to scrape the data and see the scraped data in the terminal but once I save the data into a csv file, it doesn't work and I get some kind of weird typing.
I used BeautifulSoup but I also know how to use scrapy and used it before. I just want to know if it's possible to scrape the data and save it into a csv file using scrapy before I try and if it's not, what else can I use?
There's also some sites that I tried to scrape using scrapy but it wasn't working. I know that there's nothing wrong with my code because I tried scraping other sites and it worked.
import requests
from bs4 import BeautifulSoup
import pandas
url = requests.get('https://www.tangerinetelecom.com.au/nbn/nbn-broadband')
soup = BeautifulSoup(url.content, 'html.parser')
plans = soup.find_all('div', class_="large-3 columns text-center")
data = []
for plan in plans:
d = {}
info = plan.find_all('p')
title = info[0].text
speed = info[1].text[0:-2]
d['Speed'] = title + '\n' + speed
d['Data'] = info[2].text
d['Trial'] = info[3].text
d['Contract'] = info[4].text
d['Setup Fee'] = info[5].text
d['Promo Price'] = info[6].text
d['Price'] = info[7].text[0:-1]
d['Price Details'] = info[8].text.replace('(', '').replace(')', '')
data.append(d)
print(data)
df = pandas.DataFrame(data)
df.to_csv("tangerine.csv")
The expected result would be this data in a csv file:
[ {'Speed': 'Basic Speed \n10Mbps Typical Evening Speed ', 'Data': 'UNLIMITED DATA', 'Trial': 'RISK FREE TRIAL', 'Contract': 'NO CONTRACT', 'Setup Fee': '$0 SETUP FEE', 'Promo Price': 'SPECIAL PROMO PRICE', 'Price': '$49.90/mth', 'Price Details': '$49.90 for 6 months, then $59.90 ongoing'},
{'Speed': 'Speed Boost \n21Mbps Typical Evening Speed ', 'Data': 'UNLIMITED DATA', 'Trial': 'RISK FREE TRIAL', 'Contract': 'NO CONTRACT', 'Setup Fee': '$0 SETUP FEE', 'Promo Price': 'SPECIAL PROMO PRICE', 'Price': '$58.90/mth', 'Price Details': '$58.90 for 6 months, then $68.90 ongoing'},
{'Speed': 'XL Speed Boost \n42Mbps Typical Evening Speed ', 'Data': 'UNLIMITED DATA', 'Trial': 'RISK FREE TRIAL', 'Contract': 'NO CONTRACT', 'Setup Fee': '$0 SETUP FEE', 'Promo Price': 'SPECIAL PROMO PRICE', 'Price': '$64.90/mth', 'Price Details': '64.90 for 6 months, then $74.90 ongoing'},
{'Speed': "XXL Speed B'st \n83Mbps Typical Evening Speed ", 'Data': 'UNLIMITED DATA', 'Trial': 'RISK FREE TRIAL', 'Contract': 'NO CONTRACT', 'Setup Fee': '$0 SETUP FEE', 'Promo Price': 'SPECIAL PROMO PRICE', 'Price': '$69.90/mth', 'Price Details': '$69.90 for 6 months, then $79.90 ongoing'} ]
But I get some kind of a weird typing in the csv file instead:
Solution
Your issue is not the code but the encoding of your libra file.
Use these steps to change the encoding from UTF-16
to UTF-8
:
File > New > Spreadsheet
, then Insert > Sheet from file
. Choose your file and OK
. You should get the text import window. At the top, check the "Character set" setting -- my guess is that it's not set properly. If it's not already, change it to UTF-8
.
Answered By - Edeki Okoh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.