A daemon for scheduling Scrapy spiders
Project description
Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It’s been inspired by scrapyd but written from scratch. It comes with a REST API, a command line client, and an interactive web interface.
Homepage: https://jany.st/scrapy-do.html
Documentation: https://scrapy-do.readthedocs.io/en/latest/
Quick Start
Install scrapy-do using pip:
$ pip install scrapy-do
Start the daemon in the foreground:
$ scrapy-do -n scrapy-do
Open another terminal window, download the Scrapy’s Quotesbot example, and push the code to the server:
$ git clone https://github.com/scrapy/quotesbot.git $ cd quotesbot $ scrapy-do-cl push-project +----------------+ | quotesbot | |----------------| | toscrape-css | | toscrape-xpath | +----------------+
Schedule some jobs:
$ scrapy-do-cl schedule-job --project quotesbot \ --spider toscrape-css --when 'every 5 to 15 minutes' +--------------------------------------+ | identifier | |--------------------------------------| | 0a3db618-d8e1-48dc-a557-4e8d705d599c | +--------------------------------------+ $ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css +--------------------------------------+ | identifier | |--------------------------------------| | b3a61347-92ef-4095-bb68-0702270a52b8 | +--------------------------------------+
See what’s going on:
Building from source
Both of the steps below require nodejs to be installed.
Check if things work fine:
$ pip install -rrequirements-dev.txt $ tox
Build the wheel:
$ python setup.py bdist_wheel
ChangeLog
Version 0.4.0
Migration to the Bootstrap 4 UI
Make it possible to add a short description to jobs
Make it possible to specify user-defined payload in each job that is passed on as a parameter to the python crawler
UI updates to support the above
New log viewers in the web UI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_do_heroku-0.4.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56c829eb9d0cfcf580cd4c453e3f8779de136991215ad0debcbd9366cbaaa2ee |
|
MD5 | 757e537a363c68580327e717202c3ea6 |
|
BLAKE2b-256 | ba4a56e782bfa76dacaf906367d1a42744da3c0dba22e17c6c754a14de674a91 |