Issue
I created GCP VM (ubunto). I installed python and scrapy.
I would like to run my spider from there, scrapy crawl test -o test1.csv
I opened the terminal from gcp and run the spider (worked), it will take at least 3 hours.
How can I make sure when i exit the terminal (browser) the script will continue.
Solution
You can use nohup to make sure the crawling continues:
nohup scrapy crawl test -o test1.csv &
When you log off, the crawler will continues until it finishes. The & at the end will make the process execute in the background. To redirect the output to a log file, you can execute it as follows:
nohup scrapy crawl test -o test1.csv &> test.log &
For a better way to run & deploy spiders on a server, you can checkout scrapyd
Answered By - Wim Hermans
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.