Skip to main content

A daemon for scheduling Scrapy spiders

Project description


=========
Scrapy Do
=========

.. image:: https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master
:target: https://travis-ci.org/ljanyst/scrapy-do

.. image:: https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master
:target: https://coveralls.io/github/ljanyst/scrapy-do?branch=master

.. image:: https://img.shields.io/pypi/v/scrapy-do.svg
:target: https://pypi.python.org/pypi/scrapy-do
:alt: PyPI Version


Scrapy Do is a daemon that provides a convenient way to run `Scrapy
<https://scrapy.org/>`_ spiders. It can either do it once - immediately; or it
can run them periodically, at specified time intervals. It's been inspired by
`scrapyd <https://github.com/scrapy/scrapyd>`_ but written from scratch. It
comes with a REST API, a command line client, and an interactive web interface.

* Homepage: `https://jany.st/scrapy-do.html <https://jany.st/scrapy-do.html>`_
* Documentation: `https://scrapy-do.readthedocs.io/en/latest/ <https://scrapy-do.readthedocs.io/en/latest/>`_

-----------
Quick Start
-----------

* Install ``scrapy-do`` using ``pip``:

.. code-block:: console

$ pip install scrapy-do

* Start the daemon in the foreground:

.. code-block:: console

$ scrapy-do -n scrapy-do

* Open another terminal window, download the Scrapy's Quotesbot example, and
push the code to the server:

.. code-block:: console

$ git clone https://github.com/scrapy/quotesbot.git
$ cd quotesbot
$ scrapy-do-cl push-project
+----------------+
| quotesbot |
|----------------|
| toscrape-css |
| toscrape-xpath |
+----------------+

* Schedule some jobs:

.. code-block:: console

$ scrapy-do-cl schedule-job --project quotesbot \
--spider toscrape-css --when 'every 5 to 15 minutes'
+--------------------------------------+
| identifier |
|--------------------------------------|
| 0a3db618-d8e1-48dc-a557-4e8d705d599c |
+--------------------------------------+

$ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css
+--------------------------------------+
| identifier |
|--------------------------------------|
| b3a61347-92ef-4095-bb68-0702270a52b8 |
+--------------------------------------+

* See what's going on:

.. figure:: https://github.com/ljanyst/scrapy-do/raw/master/docs/_static/jobs-active.png
:scale: 50 %
:alt: Active Jobs

The web interface is available at http://localhost:7654 by default.

--------------------
Building from source
--------------------

Both of the steps below require `nodejs` to be installed.

* Check if things work fine:

.. code-block:: console

$ pip install -rrequirements-dev.txt
$ tox

* Build the wheel:

.. code-block:: console

$ python setup.py bdist_wheel

---------
ChangeLog
---------

Version 0.4.0
-------------

* Migration to the Bootstrap 4 UI
* Make it possible to add a short description to jobs
* Make it possible to specify user-defined payload in each job that is passed
on as a parameter to the python crawler
* UI updates to support the above
* New log viewers in the web UI


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-do-heroku-0.4.1.tar.gz (62.0 kB view details)

Uploaded Source

Built Distribution

scrapy_do_heroku-0.4.1-py2.py3-none-any.whl (2.1 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapy-do-heroku-0.4.1.tar.gz.

File metadata

  • Download URL: scrapy-do-heroku-0.4.1.tar.gz
  • Upload date:
  • Size: 62.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for scrapy-do-heroku-0.4.1.tar.gz
Algorithm Hash digest
SHA256 52c121b45c8e3b080d30308e41575efc1c06227a27f992d9d9277d0920fa7a2a
MD5 2402bddabd858ce1fa39de9fcf4bac2c
BLAKE2b-256 fa6e8e1cd199b457d3eef5f5577bd07ccb8f40167bc664d3a8130fd289b6eddb

See more details on using hashes here.

File details

Details for the file scrapy_do_heroku-0.4.1-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapy_do_heroku-0.4.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for scrapy_do_heroku-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 56c829eb9d0cfcf580cd4c453e3f8779de136991215ad0debcbd9366cbaaa2ee
MD5 757e537a363c68580327e717202c3ea6
BLAKE2b-256 ba4a56e782bfa76dacaf906367d1a42744da3c0dba22e17c6c754a14de674a91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page