A Spider Runner for Scrapy
Project description
Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It’s been inspired by scrapyd but written from scratch. For the time being, it only comes with a REST API and a command line client. Version 0.3.0 will have an interactive web interface.
Homepage: https://jany.st/scrapy-do.html
Documentation: https://scrapy-do.readthedocs.io/en/latest/
Quick Start
Install scrapy-do using pip:
$ pip install scrapy-do
Start the daemon in the foreground:
$ scrapy-do -n scrapy-do
Open another terminal window and store the server’s URL in the client’s configuration file so that you don’t have to type it all the time:
$ cat > ~/.scrapy-do.cfg << EOF > [scrapy-do] > url=http://localhost:7654 > EOF
Download the Scrapy’s Quotesbot example and push the code to the server:
$ git clone https://github.com/scrapy/quotesbot.git $ cd quotesbot $ scrapy-do-cl push-project +----------------+ | spiders | |----------------| | toscrape-css | | toscrape-xpath | +----------------+
Schedule some jobs:
$ scrapy-do-cl schedule-job --project quotesbot \ --spider toscrape-css --when 'every 5 to 15 minutes' +--------------------------------------+ | identifier | |--------------------------------------| | 0a3db618-d8e1-48dc-a557-4e8d705d599c | +--------------------------------------+ $ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css +--------------------------------------+ | identifier | |--------------------------------------| | b3a61347-92ef-4095-bb68-0702270a52b8 | +--------------------------------------+
See what’s going on:
$ scrapy-do-cl list-jobs +--------------------------------------+-----------+--------------+-----------+-----------------------+---------+----------------------------+------------+ | identifier | project | spider | status | schedule | actor | timestamp | duration | |--------------------------------------+-----------+--------------+-----------+-----------------------+---------+----------------------------+------------| | b3a61347-92ef-4095-bb68-0702270a52b8 | quotesbot | toscrape-css | RUNNING | now | USER | 2018-01-27 08:32:19.781720 | | | 0a3db618-d8e1-48dc-a557-4e8d705d599c | quotesbot | toscrape-css | SCHEDULED | every 5 to 15 minutes | USER | 2018-01-27 08:29:24.749770 | | +--------------------------------------+-----------+--------------+-----------+-----------------------+---------+----------------------------+------------+
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for scrapy_do-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8580bb7421a777ee3b6103e77aa808ada9c67c222b9f9e1d7bc6b0e307ac5fb8 |
|
MD5 | 5e180cd2a352befea28275d61d3a84ef |
|
BLAKE2b-256 | 5b78bd7f2adcdabddc939218d1cc8fa54b081697f3826048743b698fb2481496 |