Skip to main content

A daemon for scheduling Scrapy spiders

Project description

https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master PyPI Version

Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It’s been inspired by scrapyd but written from scratch. It comes with a REST API, a command line client, and an interactive web interface.

Quick Start

  • Install scrapy-do using pip:

    $ pip install scrapy-do
  • Start the daemon in the foreground:

    $ scrapy-do -n scrapy-do
  • Open another terminal window, download the Scrapy’s Quotesbot example, and push the code to the server:

    $ git clone https://github.com/scrapy/quotesbot.git
    $ cd quotesbot
    $ scrapy-do-cl push-project
    +----------------+
    | quotesbot      |
    |----------------|
    | toscrape-css   |
    | toscrape-xpath |
    +----------------+
  • Schedule some jobs:

    $ scrapy-do-cl schedule-job --project quotesbot \
        --spider toscrape-css --when 'every 5 to 15 minutes'
    +--------------------------------------+
    | identifier                           |
    |--------------------------------------|
    | 0a3db618-d8e1-48dc-a557-4e8d705d599c |
    +--------------------------------------+
    
    $ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css
    +--------------------------------------+
    | identifier                           |
    |--------------------------------------|
    | b3a61347-92ef-4095-bb68-0702270a52b8 |
    +--------------------------------------+
  • See what’s going on:

    Active Jobs

    The web interface is available at http://localhost:7654 by default.

Building from source

Both of the steps below require nodejs to be installed.

  • Check if things work fine:

    $ pip install -rrequirements-dev.txt
    $ tox
  • Build the wheel:

    $ python setup.py bdist_wheel

ChangeLog

Version 0.4.0

  • Migration to the Bootstrap 4 UI

  • Make it possible to add a short description to jobs

  • Make it possible to specify user-defined payload in each job that is passed on as a parameter to the python crawler

  • UI updates to support the above

  • New log viewers in the web UI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-do-heroku-0.4.1.tar.gz (62.0 kB view hashes)

Uploaded Source

Built Distribution

scrapy_do_heroku-0.4.1-py2.py3-none-any.whl (2.1 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page