Issue
I am looking into how I can scrape my correct LeetCode submissions and upload them to Github. Being a beginner to web scraping, I read through a few blogs and I understand that we can use python libraries like BeautifulSoup, Scrapy, Selenium, etc., to perform scraping. But is it true that we can only scrape the routes which aren't disallowed in the robots.txt
of the website ? Because Leetcode's robots.txt
has disallowed the submissions route.
The robots.txt
page of leetcode
If it is true that disallowed pages cannot be scraped, then is there any other way I can scrape my correct submissions ? Any advice is welcome as I am an absolute beginner here :)
P.S. An outline of the process is more than enough and I do not need the exact code. Thank You.
Solution
But is it true that we can only scrape the routes which aren't disallowed in the robots.txt of the website
Technically this is for massive indexer bots like Google, Yandex, Majestic12, etc... You're also not obligated to follow robots.txt
but it's the nice thing to do.
Since you're not doing a massive scrape and just want your own submissions you should be fine unless you code it wrong and start DDOSing the website.
If you don't want to code it you can check GitHub for other's people code like https://github.com/world177/Leetcode-Downloader-for-Submissions which has 75 stars so it should be safe, regardless I can't guarantee its safety or suitability for your specific needs so make sure to review the repository yourself.
Answered By - Daviid
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.