Issue
I am trying to get data from this website: https://nftcalendar.io/ My approach is if I get the HTML code, I can play around with elements via beautiful soup and then work around them. But every time I try to do that, I get this "Checking Connection" and the controlled tab is stuck on the same.
This is the code that I am using:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
from selenium.webdriver import DesiredCapabilities
options = webdriver.ChromeOptions()
options.add_argument('--allow-insecure-localhost') # differ on driver version. can ignore.
options.add_experimental_option('excludeSwitches', ['enable-logging'])
caps = options.to_capabilities()
caps["acceptInsecureCerts"] = True
driver = webdriver.Chrome('C:/Users/Kaiwalya/Desktop/chromedriver.exe',desired_capabilities=caps, options = options)
driver.get("https://nftcalendar.io/")
driver.maximize_window()
html = driver.page_source
soup = BeautifulSoup(html)
Any ideas, on how to go about it? Thank you!
Solution
It appears that Cloudflare's anti-bot protection has identified your website requests as coming from an automated bot, resulting in your access to the application being denied. The presence of the "protected by Cloudflare" message at the bottom of the page indicates this.
In this scenario, using an undetected version of the Chromedriver to initialize the Chrome Browsing Context is a good solution.
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io
I have tested below and it works fine for the website mentioned
Setup
pip install undetected-chromedriver
Code to run
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get("https://nftcalendar.io/")
print(driver.current_url)
print(driver.title)
you can read more here Undetected ChromeDriver
It's important to note that this approach may not always be effective, as Cloudflare's anti-bot measures can evolve and become more sophisticated over time.
Answered By - Abhay Chaudhary
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.