Wednesday, September 14, 2022

[FIXED] Scrape web to csv file

September 14, 2022 beautifulsoup, csv, python, web-scraping No comments

Issue

I'm trying to scrape the different prices for an item and i would like to scrape all the available items to get the average price ,i've tried the below code but it only output the first value in the list and open the csv but with no data just the header

     #Open URL
link3= "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2334524.m570.l2632&_nkw=naruto+shippuden+ultimate+ninja+storm+4+ps4&_sacat=139973&LH_TitleDesc=0&rt=nc&_odkw=Naruto+Shippuden%3A+Ultimate+Ninja+Storm+4&_osacat=0&LH_BIN=1&LH_PrefLoc=1"
req = Request(link3, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
# # #Get to the sections
#Create excel file with headers
with open('yellowPage.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['link','prices','avg']
    thewriter.writerow(header)
    #Loop between section to scrap data
with requests.Session() as c:

    soup = BeautifulSoup(webpage, 'html5lib')
    lists = soup.find_all('li',class_='s-item s-item__pl-on-bottom')
    prices = []
    for list in lists:
         prices.append(float(list.find('span', class_="s-item__price").text.replace('£','').replace(',','').replace('$','')))
    avg =sum(prices)/len(prices)
    print(avg)
    print(prices)
    print(len(prices))
    info=[link3,prices,avg]
    thewriter.writerow(info)

I need help in identifying the best way to get all the items' price from all the available pages as well as send scrapped data to csv file

Solution

This should do what you want. I found the last page number, i.e. 9, and then scraped each page until the last page was scraped.

There is, however, an issue with gathering all of the products; there are 9 pages and each page displays 60 products (by default), but I was only able to get 265 prices. The discrepancy is likely caused by the product li tags having different class attributes. For example some, of the class attributes had only had the s-item s-item__pl-on-bottom and not s-item--watch-at-corner.

import requests
from bs4 import BeautifulSoup

# getting html of first page to find total number of succeeding pages
page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn=1').text
soup = BeautifulSoup(page, 'html.parser')

# find last page number
end_page = soup.find('a', href='https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn=9&rt=nc').text

prices = []
page_num = 0

# gets html of each page until last page is reach
while page_num < int(end_page):
    page_num += 1
    page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn={page_num}').text
    soup = BeautifulSoup(page, 'html.parser')

    # list of all ;i tags in a page
    lists = soup.find_all('li', class_="s-item s-item__pl-on-bottom s-item--watch-at-corner")

    # iterate over each page's li tags and append product price to a list
    for list in lists:
        prices.append(float(list.find('span', class_="s-item__price").text.replace('£','').replace(',','')))

# Average price of the scraped product prices
print(sum(prices)/len(prices))

Answered By - Übermensch

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, September 14, 2022

[FIXED] Scrape web to csv file

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels