A daemon for scheduling Scrapy spiders
Project description
=========
Scrapy Do
=========
.. image:: https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master
:target: https://travis-ci.org/ljanyst/scrapy-do
.. image:: https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master
:target: https://coveralls.io/github/ljanyst/scrapy-do?branch=master
.. image:: https://img.shields.io/pypi/v/scrapy-do.svg
:target: https://pypi.python.org/pypi/scrapy-do
:alt: PyPI Version
Scrapy Do is a daemon that provides a convenient way to run `Scrapy
<https://scrapy.org/>`_ spiders. It can either do it once - immediately; or it
can run them periodically, at specified time intervals. It's been inspired by
`scrapyd <https://github.com/scrapy/scrapyd>`_ but written from scratch. It
comes with a REST API, a command line client, and an interactive web interface.
* Homepage: `https://jany.st/scrapy-do.html <https://jany.st/scrapy-do.html>`_
* Documentation: `https://scrapy-do.readthedocs.io/en/latest/ <https://scrapy-do.readthedocs.io/en/latest/>`_
-----------
Quick Start
-----------
* Install ``scrapy-do`` using ``pip``:
.. code-block:: console
$ pip install scrapy-do
* Start the daemon in the foreground:
.. code-block:: console
$ scrapy-do -n scrapy-do
* Open another terminal window, download the Scrapy's Quotesbot example, and
push the code to the server:
.. code-block:: console
$ git clone https://github.com/scrapy/quotesbot.git
$ cd quotesbot
$ scrapy-do-cl push-project
+----------------+
| quotesbot |
|----------------|
| toscrape-css |
| toscrape-xpath |
+----------------+
* Schedule some jobs:
.. code-block:: console
$ scrapy-do-cl schedule-job --project quotesbot \
--spider toscrape-css --when 'every 5 to 15 minutes'
+--------------------------------------+
| identifier |
|--------------------------------------|
| 0a3db618-d8e1-48dc-a557-4e8d705d599c |
+--------------------------------------+
$ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css
+--------------------------------------+
| identifier |
|--------------------------------------|
| b3a61347-92ef-4095-bb68-0702270a52b8 |
+--------------------------------------+
* See what's going on:
.. figure:: https://github.com/ljanyst/scrapy-do/raw/master/docs/_static/jobs-active.png
:scale: 50 %
:alt: Active Jobs
The web interface is available at http://localhost:7654 by default.
--------------------
Building from source
--------------------
Both of the steps below require `nodejs` to be installed.
* Check if things work fine:
.. code-block:: console
$ pip install -rrequirements-dev.txt
$ tox
* Build the wheel:
.. code-block:: console
$ python setup.py bdist_wheel
---------
ChangeLog
---------
Version 0.4.0
-------------
* Migration to the Bootstrap 4 UI
* Make it possible to add a short description to jobs
* Make it possible to specify user-defined payload in each job that is passed
on as a parameter to the python crawler
* UI updates to support the above
* New log viewers in the web UI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy-do-heroku-0.4.1.tar.gz
(62.0 kB
view details)
Built Distribution
File details
Details for the file scrapy-do-heroku-0.4.1.tar.gz
.
File metadata
- Download URL: scrapy-do-heroku-0.4.1.tar.gz
- Upload date:
- Size: 62.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52c121b45c8e3b080d30308e41575efc1c06227a27f992d9d9277d0920fa7a2a |
|
MD5 | 2402bddabd858ce1fa39de9fcf4bac2c |
|
BLAKE2b-256 | fa6e8e1cd199b457d3eef5f5577bd07ccb8f40167bc664d3a8130fd289b6eddb |
File details
Details for the file scrapy_do_heroku-0.4.1-py2.py3-none-any.whl
.
File metadata
- Download URL: scrapy_do_heroku-0.4.1-py2.py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56c829eb9d0cfcf580cd4c453e3f8779de136991215ad0debcbd9366cbaaa2ee |
|
MD5 | 757e537a363c68580327e717202c3ea6 |
|
BLAKE2b-256 | ba4a56e782bfa76dacaf906367d1a42744da3c0dba22e17c6c754a14de674a91 |