Issue
I want to Scrape All the Post Containing some #hashtag from Instagram
I tried it from : https://www.instagram.com/explore/tags/perfume/?__a=1
But It's only giving some posts not every post.
Solution
Look carefully at the json you receive.
Navigate to graphql -> hashtag -> edge_hashtag_to_media -> page_info -> end_cursor
That's the identifier you have to use to specify the next batch of medias, like this:
https://www.instagram.com/explore/tags/perfume/?__a=1&max_id=QVFDNWJDZnpGbElpdEV5Q19aaldYWUsxZnc1YUd0Z21yNUZsOWw4V2NxX05ZWnZjT2pRb3lrY29ocDJnM0VNallUWGZVeDIxVURnUzltdHpBR1A1a0VRNw==
You can iterate this process to get more medias for requested hashtag.
A naive example with requests (python3) to extract first 10 batches.
import requests
import json
from time import sleep
max_id = ''
base_url = "https://www.instagram.com/explore/tags/perfume/?__a=1"
for i in range(0, 10):
sleep(2) # Be polite.
if max_id:
url = base_url + f"&max_id={max_id}"
else:
url = base_url
print(f"Requesting {url}")
response = requests.get(url)
response = json.loads(response.text)
try:
max_id = response['graphql']['hashtag']['edge_hashtag_to_media']['page_info']['end_cursor']
print(f"New cursor is {max_id}")
except KeyError:
print("There's no next page!")
break
As said in comment, be polite. Instagram will block you if you shoot too many requests per second.
Answered By - Manuel Fedele
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.