Issue
I want to request this url and get its information:
It's a simple page, if you open it you will notice. The code I use is:
import requests
req = requests.Session()
link12month = "https://search.codal.ir/api/search/v2/q?&Audited=true&AuditorRef=-1&Category=1&Childs=false&CompanyState=0&CompanyType=-1&Consolidatable=true&IsNotAudited=false&Isic=232007&Length=12&LetterCode=ن-10&LetterType=-1&Mains=true&NotAudited=false&NotConsolidatable=true&PageNumber=1&Publisher=false&Symbol=شپنا&TracingNo=-1&search=true"
response = req.get(link12month, verify=False
, headers={'Content-Type': 'application/xml; charset=utf-8'})
print(response.status_code, response.text)
But Requests.get does not return anything to me(No error will be returned and no response will be received). I have already received information about this page with selenium, but selenium is much slower than requisition because I want to req about 1000 pages (same url with minor changes).
If you have a way, thank you for guiding me. What about requests in another quick way (for example, can scrapy
be used?) If yes, please tell me
Solution
Found the issue:
You need to have a User-Agent
header:
import requests
if __name__ == '__main__':
req = requests.Session()
link12month = "https://search.codal.ir/api/search/v2/q?&Audited=true&AuditorRef=-1&Category=1&Childs=false&CompanyState=0&CompanyType=-1&Consolidatable=true&IsNotAudited=false&Isic=232007&Length=12&LetterCode=ن-10&LetterType=-1&Mains=true&NotAudited=false&NotConsolidatable=true&PageNumber=1&Publisher=false&Symbol=شپنا&TracingNo=-1&search=true"
response = req.get(link12month, headers={'Accept': 'application/xml; charset=utf-8','User-Agent':'foo'})
print(response.status_code, response.text)
You can also simplify your code a bit here as well:
import requests
if __name__ == '__main__':
link12month = "https://search.codal.ir/api/search/v2/q?&Audited=true&AuditorRef=-1&Category=1&Childs=false&CompanyState=0&CompanyType=-1&Consolidatable=true&IsNotAudited=false&Isic=232007&Length=12&LetterCode=ن-10&LetterType=-1&Mains=true&NotAudited=false&NotConsolidatable=true&PageNumber=1&Publisher=false&Symbol=شپنا&TracingNo=-1&search=true"
response = requests.get(link12month, headers={'Accept': 'application/xml; charset=utf-8','User-Agent':'foo'})
print(response.status_code, response.text)
Make sure you verify if you can as well. If you want JSON, remove the 'Accept' header.
You can also simplify the URL a lot by adding the parameters as a dictionary, check out the docs here for the params
argument https://www.w3schools.com/python/ref_requests_get.asp
Answered By - Ash Oldershaw
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.