Skip to main content

A Python wrapper for working with the Scrapyd API

Project description

python-scrapyd-api

The PyPI version Build status on Travis-CI Coverage status on Coveralls Documentation status on ReadTheDocs

A Python wrapper for working with Scrapyd's API.

Current released version: 2.1.2 (see history).

Allows a Python application to talk to, and therefore control, the Scrapy daemon: Scrapyd.

Install

Easiest installation is via pip:

pip install python-scrapyd-api

Quick Usage

Please refer to the full documentation for more detailed usage but to get you started:

>>> from scrapyd_api import ScrapydAPI
>>> scrapyd = ScrapydAPI('http://localhost:6800')

Add a project egg as a new version:

>>> egg = open('some_egg.egg', 'rb')
>>> scrapyd.add_version('project_name', 'version_name', egg)
# Returns the number of spiders in the project.
3
>>> egg.close()

Cancel a scheduled job:

>>> scrapyd.cancel('project_name', '14a6599ef67111e38a0e080027880ca6')
# Returns the "previous state" of the job before it was cancelled: 'running' or 'pending'.
'running'

Delete a project and all sibling versions:

>>> scrapyd.delete_project('project_name')
# Returns True if the request was met with an OK response.
True

Delete a version of a project:

>>> scrapyd.delete_version('project_name', 'version_name')
# Returns True if the request was met with an OK response.
True

Request status of a job:

>>> scrapyd.job_status('project_name', '14a6599ef67111e38a0e080027880ca6')
# Returns 'running', 'pending', 'finished' or '' for unknown state.
'running'

List all jobs registered:

>>> scrapyd.list_jobs('project_name')
# Returns a dict of running, finished and pending job lists.
{
    'pending': [
        {
            u'id': u'24c35...f12ae',
            u'spider': u'spider_name'
        },
    ],
    'running': [
        {
            u'id': u'14a65...b27ce',
            u'spider': u'spider_name',
            u'start_time': u'2014-06-17 22:45:31.975358'
        },
    ],
    'finished': [
        {
            u'id': u'34c23...b21ba',
            u'spider': u'spider_name',
            u'start_time': u'2014-06-17 22:45:31.975358',
            u'end_time': u'2014-06-23 14:01:18.209680'
        }
    ]
}

List all projects registered:

>>> scrapyd.list_projects()
[u'ecom_project', u'estate_agent_project', u'car_project']

List all spiders available to a given project:

>>> scrapyd.list_spiders('project_name')
[u'raw_spider', u'js_enhanced_spider', u'selenium_spider']

List all versions registered to a given project:

>>> scrapyd.list_versions('project_name'):
[u'345', u'346', u'347', u'348']

Schedule a job to run with a specific spider:

# Schedule a job to run with a specific spider.
>>> scrapyd.schedule('project_name', 'spider_name')
# Returns the Scrapyd job id.
u'14a6599ef67111e38a0e080027880ca6'

Schedule a job to run while passing override settings:

>>> settings = {'DOWNLOAD_DELAY': 2}
>>> scrapyd.schedule('project_name', 'spider_name', settings=settings)
u'25b6588ef67333e38a0e080027880de7'

Schedule a job to run while passing extra attributes to spider initialisation:

>>> scrapyd.schedule('project_name', 'spider_name', extra_attribute='value')
# NB: 'project', 'spider' and 'settings' are reserved kwargs for this
# method and therefore these names should be avoided when trying to pass
# extra attributes to the spider init.
u'25b6588ef67333e38a0e080027880de7'

Setting up the project to contribute code

Please see CONTRIBUTING.md. This will guide you through our pull request guidelines, project setup and testing requirements.

License

2-clause BSD. See the full LICENSE.

History

2.1.1 (2018-04-01)

  • Base set of docs converted to markdown (README, AUTHORS, CONTRIBUTING, HISTORY)

2.1.0 (2018-03-31)

  • Introduces the timeout keyword argument, which allows the caller to specify a timeout after which requests to the scrapyd server give up. This works as per the underlying requests library, and raises requests.exceptions.Timeout when the timeout is exceeded. See docs for usage.

2.0.1 (2016-02-27)

v2.0.0 shipped with docs which were slightly out of date for the cancel endpoint, this release corrects that.

2.0.0 (2016-02-27)

Why Version 2? This package has been production ready and stable in use for over a year now, so it's ready to commit to a stable API /w semver. Version 1 has deliberately been skipped to make it absolutely clear that this release contains a breaking change:

Breaking changes:

  • The cancel job endpoint now returns the previous state of the successfully cancelled spider rather than a simple boolean True/False. This change was made because: a) the boolean return was relatively useless and actually hiding data the scrapyd API passes us as part of the cancel endpoint response. b) before this change, the method would have returned True only if the cancelled job was previously running, and this resulted in us incorrectly reporting False when a pending job was cancelled. This may require no changes to your codebase but nevertheless it is a change in a public API, thus the requirement for major version bumping.

Other changes:

  • The cancel job endpoint now accepts a signal keyword argument which is the termination signal Scrapyd uses to cancel the spider job. If not specified, the value is not sent to the scrapyd endpoint at all, therefore allows scrapyd control over which default signal gets used (currently TERM).

0.2.0 (2015-01-14)

  • Added the new job_status method which can retrieve the job status of a specific job from a project. See docs for usage.
  • Increased and improved test coverage.

0.1.0 (2014-09-16)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-scrapyd-api-2.1.2.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

python_scrapyd_api-2.1.2-py2.py3-none-any.whl (12.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file python-scrapyd-api-2.1.2.tar.gz.

File metadata

File hashes

Hashes for python-scrapyd-api-2.1.2.tar.gz
Algorithm Hash digest
SHA256 e64f7caab23e541967fdf78926af5e6be6a863227f78d16b7ebb3a35acfc2099
MD5 cd62272e031d944d657e97ceebb837c6
BLAKE2b-256 dbfb7ba0c7242741c8352dab11a26106b23d8740ce98323633defd4656909bd0

See more details on using hashes here.

File details

Details for the file python_scrapyd_api-2.1.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for python_scrapyd_api-2.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ab92d3461a81f46aaa6d82cc3de610642892c5454ddebb74d2b81fbc55e9c807
MD5 f5bc6b0c39b578f576be4705cf20f9ce
BLAKE2b-256 1313cf8bbd7a6462a805c26bfd8eb92a0fc1f5e69a13fa2e5fbd87360943cacc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page