Issue
I'm trying to scrape several links containing information about events. I am rotating my paid proxies and user agents generated by UserAgent library. Imperva, which requires a US IP, is so sensitive that even it doesn't allow my browser event if I use a free US proxy!
I asked this question in a scrapiping-releated Discord channel. Someone contacted me and said it is possible to bypass Imperva but he can't tell me how because he doesn't wan't me as a competitor in the ticket scraping market :(
In addition to user agents and proxies, I tried to imitate the browser's succesful request headers but it didn't work. I just have 405s and 403s. I will try to scrape the event section but I couldn't even see a 200 response for any of the 27 links I have ( I added some below)
How do you think Imperva could be bypassed with Scrapy or Requests? It's also okay to recommend me an academic resource which I can study to develope my Scrapy skills.
Some of the links I'm trying to scrape
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=
My spider code which is comprised of a class to import my proxies from file and the spider code proper. I add my proxy as a meta value as said in Scrapy documentation. I use download delays:
import scrapy
from scrapy import Request
from random_user_agent.user_agent import UserAgent
import random
import pandas as pd
class ProxyFunctions:
(...)
class AlexSpider(scrapy.Spider):
name = 'alex'
s = ProxyFunctions()
s.prox_list_fixer() #proxylerin bulunduğu txt'yi düzelip yeni bir txt oluşturdu.
proxies = s.imp_proxies()
def __init__(self):
self.root = "https://partnercarrier.com"
self.start_url = "https://partnercarrier.com/PA/"
#self.initial_links = self.imp_links() dosyadan tüm linkler eklendiğinde kullanılacak
user_agent_rotator = UserAgent(software_names=['chrome'], operating_systems=['windows', 'linux'])
self.user_agents = user_agent_rotator.get_user_agents()
#self.root_link = "https://www.google.com"
self.UA_rand = random.choice(self.user_agents)['user_agent'] #User Agent set
#self.UA_LIST = open("/home/draco/docs/scraping/scrapyyy/thomas/USER_AGENTS.txt","r") #manual UA importation from text
#dosyadaki proxy listesinden random proxy alır
def imp_randp(self, path="/home/draco/docs/scraping/scrapyyy/thomas/proxies.txt"):
with open (path) as PROXIES:
lines = PROXIES.readlines()
return random.choice(lines).strip()
#dosyadan linkleri alır
def imp_links(self, path="/home/draco/docs/scraping/Selenium/inputs.csv"):
x = pd.read_csv(path)
links = x['Url']
links = [i for i in links]
return links
def start_requests(self):
print("INITIAL REQUEST")
links = self.imp_links()
for link in links:
print(f"---INFO: Requesting page=> {link}")
proxy = random.choice(self.proxies)
#print("---INFO: Using proxy => ", proxy)
h = {
'User-Agent': random.choice(self.user_agents)['user_agent'],
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'tr-TR,tr;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host' : link.split("/")[2],
'Sec-Fetch-Dest': 'document',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Mode': 'navigate',
'sec-ch-ua-platform': '"Linux"',
'sec-ch-ua' : '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
}
b = 'groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode='
yield Request(
url = link ,
callback = self.parse_gen,
headers = {"user-agent": random.choice(self.user_agents)['user_agent']},
meta = {"proxy": proxy},
body = b,
dont_filter= True
)
def parse_gen(self, response):
print("---INFO: General parser opened. PARSER1")
My terminal Output:
draco@draco:~/docs/scraping/scrapyyy/upwork$ scrapy crawl alex
https://umasstix.evenue.net
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: upwork)
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.10 (default, Nov 26 2021, 20:14:08) - [GCC 9.3.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.13.0-35-generic-x86_64-with-glibc2.29
2022-03-20 20:23:01 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-03-20 20:23:01 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
'BOT_NAME': 'upwork',
'CONCURRENT_REQUESTS_PER_DOMAIN': 14,
'HTTPCACHE_ENABLED': True,
'NEWSPIDER_MODULE': 'upwork.spiders',
'SPIDER_MODULES': ['upwork.spiders']}
2022-03-20 20:23:01 [scrapy.extensions.telnet] INFO: Telnet Password: 7f185fdb1347847f
2022-03-20 20:23:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-20 20:23:05 [scrapy.core.engine] INFO: Spider opened
2022-03-20 20:23:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-20 20:23:05 [scrapy.extensions.httpcache] DEBUG: Using filesystem cache storage in /home/draco/docs/scraping/scrapyyy/upwork/.scrapy/httpcache
2022-03-20 20:23:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
INITIAL REQUEST
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
---INFO: General parser opened. PARSER1
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Closing spider (finished)
2022-03-20 20:23:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 15189,
'downloader/request_count': 27,
'downloader/request_method_count/GET': 27,
'downloader/response_bytes': 304575,
'downloader/response_count': 27,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/403': 16,
'downloader/response_status_count/405': 10,
'elapsed_time_seconds': 0.444587,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 3, 20, 17, 23, 6, 67887),
'httpcache/hit': 27,
'httperror/response_ignored_count': 26,
'httperror/response_ignored_status_count/403': 16,
'httperror/response_ignored_status_count/405': 10,
'log_count/DEBUG': 28,
'log_count/INFO': 36,
'memusage/max': 126562304,
'memusage/startup': 126562304,
'response_received_count': 27,
'scheduler/dequeued': 27,
'scheduler/dequeued/memory': 27,
'scheduler/enqueued': 27,
'scheduler/enqueued/memory': 27,
'start_time': datetime.datetime(2022, 3, 20, 17, 23, 5, 623300)}
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Spider closed (finished)
Solution
i bypass imperva using real chrome browser using browser extension to automate the process and usa mobile proxy. imperva checks followings,
- ip address (most important)
- screen resolution, window sizing parameters, document sizing parameters (important)
- useragent (less important)
Answered By - pars
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.