Skip to main content

A daemon for scheduling Scrapy spiders

Project description


=========
Scrapy Do
=========

.. image:: https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master
:target: https://travis-ci.org/ljanyst/scrapy-do

.. image:: https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master
:target: https://coveralls.io/github/ljanyst/scrapy-do?branch=master

.. image:: https://img.shields.io/pypi/v/scrapy-do.svg
:target: https://pypi.python.org/pypi/scrapy-do
:alt: PyPI Version


Scrapy Do is a daemon that provides a convenient way to run `Scrapy
<https://scrapy.org/>`_ spiders. It can either do it once - immediately; or it
can run them periodically, at specified time intervals. It's been inspired by
`scrapyd <https://github.com/scrapy/scrapyd>`_ but written from scratch. It
comes with a REST API, a command line client, and an interactive web interface.

* Homepage: `https://jany.st/scrapy-do.html <https://jany.st/scrapy-do.html>`_
* Documentation: `https://scrapy-do.readthedocs.io/en/latest/ <https://scrapy-do.readthedocs.io/en/latest/>`_

-----------
Quick Start
-----------

* Install ``scrapy-do`` using ``pip``:

.. code-block:: console

$ pip install scrapy-do

* Start the daemon in the foreground:

.. code-block:: console

$ scrapy-do -n scrapy-do

* Open another terminal window, download the Scrapy's Quotesbot example, and
push the code to the server:

.. code-block:: console

$ git clone https://github.com/scrapy/quotesbot.git
$ cd quotesbot
$ scrapy-do-cl push-project
+----------------+
| quotesbot |
|----------------|
| toscrape-css |
| toscrape-xpath |
+----------------+

* Schedule some jobs:

.. code-block:: console

$ scrapy-do-cl schedule-job --project quotesbot \
--spider toscrape-css --when 'every 5 to 15 minutes'
+--------------------------------------+
| identifier |
|--------------------------------------|
| 0a3db618-d8e1-48dc-a557-4e8d705d599c |
+--------------------------------------+

$ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css
+--------------------------------------+
| identifier |
|--------------------------------------|
| b3a61347-92ef-4095-bb68-0702270a52b8 |
+--------------------------------------+

* See what's going on:

.. figure:: https://github.com/ljanyst/scrapy-do/raw/master/docs/_static/jobs-active.png
:scale: 50 %
:alt: Active Jobs

The web interface is available at http://localhost:7654 by default.

--------------------
Building from source
--------------------

Both of the steps below require `nodejs` to be installed.

* Check if things work fine:

.. code-block:: console

$ pip install -rrequirements-dev.txt
$ tox

* Build the wheel:

.. code-block:: console

$ python setup.py bdist_wheel

---------
ChangeLog
---------

Version 0.5.0
-------------

* Rewrite the log handling functionality to resolve duplication issues
* Bump the JavaScript dependencies to resolve browser caching issues
* Make the error message on failed spider listing more descriptive (Bug #28)
* Make sure that the spider descriptions and payloads get handled properly on
restart (Bug #24)
* Clarify the documentation on passing arguments to spiders (Bugs #23 and #27)

Version 0.4.0
-------------

* Migration to the Bootstrap 4 UI
* Make it possible to add a short description to jobs
* Make it possible to specify user-defined payload in each job that is passed
on as a parameter to the python crawler
* UI updates to support the above
* New log viewers in the web UI


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scrapy_do-0.5.0-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file scrapy_do-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_do-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for scrapy_do-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ccccc7c14c3e38b0c6bd1f97128e4fc90120b3965248887dfdda5fbcc02f0750
MD5 f4ae9a35228fe085ef75bda6cc077765
BLAKE2b-256 65e843811a916923f42ca802d1bdeac056ae33191d895d67d33a63270b44dd6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page