Wednesday, September 14, 2022

[FIXED] NoneType object BeautifulSoup

September 14, 2022 beautifulsoup, python No comments

Issue

I have a script which goal is to get comments number within this url : https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999 So my script normally would get "190330 commentaires" but after some lines of script he finds NoneType object ? I scrape the exact balisetype with his class or id name.

Here is my script :

from bs4 import BeautifulSoup
import time
import re

###########################SEARCH##################
while(True):
    sent = 0
    headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
    cookies={"cookie_policy_agreement" :"3"}
    url = 'https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390#comments'
    response = requests.get(url, headers=headers, cookies=cookies)
    html = response.text
    soup = BeautifulSoup(html,'html.parser') #sinon html5lib

    #Whole comments
    comments = soup.find("div", id="comments")
    comments = comments.find("section", class_="bg--main overflow--hidden bRad--fromW4-a")
    comments = comments.find("div", class_="space--h-3 space--v-3") #this is none object ?
    comments = comments.find("h2", class_="flex--inline boxAlign-ai--all-c")
    comments = comments.find("span", class_="size--all-m size--fromW3-l text--b overflow--wrap-off").text
    print(comments)
    time.sleep(30)

Solution

When a tag's find method returns None, it means that that tag has no child element that satisfies the provided criteria. In this case, the <section> element you found has no <div> inside it with the classes space--h-3 space--v-3. Looking at the page source at the link you provided, that is indeed the case. There is no such <div>.

Either way, it seems you are doing a lot of unnecessary find operations.

When an element on a page has an id attribute that usually means there will not be another element with the same id. Since you are looking for the number of "commentaires", I would try to start with the closest parent element that has an id attribute.

In this case, that seems to be a <div id="thread-comments" ...> closest to it. The line you are interested in also seems to be inside the only <h2> tag below that aforementioned <div>, or at least definitely the first one. Thus I would suggest the following optimization:

import re
...
soup = BeautifulSoup(html, 'html.parser')

comments_div = soup.find("div", id="thread-comments")
num_comments_line = comments_div.h2.get_text(strip=True)
# This is optional, if you actually want just the number itself:
match = re.search(r'^(\d+)\s+\w+', num_comments_line)
num_comments = int(match.group(1))

print(num_comments)  # output: 189010

Note that these two are equivalent: (see docs)

comments_div.h2
comments_div.find("h2")

The last bit is just a regular expression to grab the number from the string that looks like 189010 commentaires.

Answered By - Daniil Fajnberg

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, September 14, 2022

[FIXED] NoneType object BeautifulSoup

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels