Issue
I want to scrape https://health.usnews.com/doctors/specialists-index while sending a request to this site through scrapy spider it shows status code as 403. In my request, I added user_agent
but also it's not working.
I referred these two answer Python Doesn't Have Permission To Access On This Server / Return City/State from ZIP and 403:You don't have permission to access /index.php on this server but it's not working for me.
my user_agent
is Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
. Some one help me to scrape the above mentioned site.
Solution
Try to add 'authority' in the headers as well. The below works for me in scrapy shell:
from scrapy import Request
headers = {
'authority': 'health.usnews.com',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
}
url = "https://health.usnews.com/doctors/specialists-index"
req = Request(url, headers=headers)
fetch(req)
Answered By - Wim Hermans
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.