Skip to main content

A Python wrapper for working with the Scrapyd API

Project description

The PyPI version Built Status on Travis-CI Coverage Status on Coveralls Documentation Status on ReadTheDocs

A Python wrapper for working with Scrapyd’s API.

Current version: 2.0.0

Allows a Python application to talk to, and therefore control, the Scrapy daemon: Scrapyd.

Install

Easiest installation is via pip:

pip install python-scrapyd-api

Quick Usage

Please refer to the full documentation for more detailed usage but to get you started:

>>> from scrapyd_api import ScrapydAPI
>>> scrapyd = ScrapydAPI('http://localhost:6800')

Add a project egg as a new version:

>>> egg = open('some_egg.egg')
>>> scrapyd.add_version('project_name', 'version_name', egg)
# Returns the number of spiders in the project.
3
>>> egg.close()

Cancel a scheduled job:

>>> scrapyd.cancel('project_name', '14a6599ef67111e38a0e080027880ca6')
# Returns the "previous state" of the job before it was cancelled: 'running' or 'pending'.
'running'

Delete a project and all sibling versions:

>>> scrapyd.delete_project('project_name')
# Returns True if the request was met with an OK response.
True

Delete a version of a project:

>>> scrapyd.delete_version('project_name', 'version_name')
# Returns True if the request was met with an OK response.
True

Request status of a job:

>>> scrapyd.job_status('project_name', '14a6599ef67111e38a0e080027880ca6')
# Returns 'running', 'pending', 'finished' or '' for unknown state.
'running'

List all jobs registered:

>>> scrapyd.list_jobs('project_name')
# Returns a dict of running, finished and pending job lists.
{
    'pending': [
        {
            u'id': u'24c35...f12ae',
            u'spider': u'spider_name'
        },
    ],
    'running': [
        {
            u'id': u'14a65...b27ce',
            u'spider': u'spider_name',
            u'start_time': u'2014-06-17 22:45:31.975358'
        },
    ],
    'finished': [
        {
            u'id': u'34c23...b21ba',
            u'spider': u'spider_name',
            u'start_time': u'2014-06-17 22:45:31.975358',
            u'end_time': u'2014-06-23 14:01:18.209680'
        }
    ]
}

List all projects registered:

>>> scrapyd.list_projects()
[u'ecom_project', u'estate_agent_project', u'car_project']

List all spiders available to a given project:

>>> scrapyd.list_spiders('project_name')
[u'raw_spider', u'js_enhanced_spider', u'selenium_spider']

List all versions registered to a given project:

>>> scrapyd.list_versions('project_name'):
[u'345', u'346', u'347', u'348']

Schedule a job to run with a specific spider:

# Schedule a job to run with a specific spider.
>>> scrapyd.schedule('project_name', 'spider_name')
# Returns the Scrapyd job id.
u'14a6599ef67111e38a0e080027880ca6'

Schedule a job to run while passing override settings:

>>> settings = {'DOWNLOAD_DELAY': 2}
>>> scrapyd.schedule('project_name', 'spider_name', settings=settings)
u'25b6588ef67333e38a0e080027880de7'

Schedule a job to run while passing extra attributes to spider initialisation:

>>> scrapyd.schedule('project_name', 'spider_name', extra_attribute='value')
# NB: 'project', 'spider' and 'settings' are reserved kwargs for this
# method and therefore these names should be avoided when trying to pass
# extra attributes to the spider init.
u'25b6588ef67333e38a0e080027880de7'

Setting up the project to contribute code

Please see CONTRIBUTING.rst which is also mirrored in the full documentation. This will guide you through our pull request guidelines, project setup and testing requirements.

License

2-clause BSD. See the full LICENSE.

History

2.0.0 (2016-02-27)

Why Version 2? This package has been production ready and stable in use for over a year now, so it’s ready to commit to a stable API via semver. We skip version 1 as I want it to be clear upgrading

Breaking changes:

  • The cancel job endpoint now returns True on hearing a successful reply from the Scrapyd API; before it would have returned True only if the cancelled job was previously running, but this resulted in us incorrectly reporting False when a pending job was actually cancelled.

Other changes:

  • The cancel job endpoint now accepts a signal keyword argument which is the termination signal Scrapyd uses to cancel the spider job. If not specified, the value is not sent to the scrapyd endpoint at all, therefore allows scrapyd control over which default signal gets used (currently TERM).

0.2.0 (2015-01-14)

  • Added the new job_status method which can retrieve the job status of a specific job from a project. See docs for usage.
  • Increased and improved test coverage.

0.1.0 (2014-09-16)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for python-scrapyd-api, version 2.0.0
Filename, size File type Python version Upload date Hashes
Filename, size python_scrapyd_api-2.0.0-py2.py3-none-any.whl (11.2 kB) File type Wheel Python version 2.7 Upload date Hashes View
Filename, size python-scrapyd-api-2.0.0.tar.gz (21.9 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page