Skip to main content

Put Scrapy spiders behind an HTTP API

Project description

https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif

ScrapyRT (Scrapy realtime)

https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg https://img.shields.io/pypi/pyversions/scrapyrt.svg https://img.shields.io/pypi/v/scrapyrt.svg https://img.shields.io/pypi/l/scrapyrt.svg Downloads count https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Add HTTP API for your Scrapy project in minutes.

You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider visiting this URL.

  • All Scrapy project components (e.g. middleware, pipelines, extensions) are supported

  • You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.

Quickstart

1. install

> pip install scrapyrt

2. switch to Scrapy project (e.g. quotesbot project)

> cd my/project_path/is/quotesbot

3. launch ScrapyRT

> scrapyrt

4. run your spiders

> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider

>  curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won’t find one. Note that you need to have all your project requirements installed.

Note

  • Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls

  • Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly

Documentation

Documentation is available on readthedocs.

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with “question” label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrt-0.18.0.tar.gz (68.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapyrt-0.18.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file scrapyrt-0.18.0.tar.gz.

File metadata

  • Download URL: scrapyrt-0.18.0.tar.gz
  • Upload date:
  • Size: 68.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapyrt-0.18.0.tar.gz
Algorithm Hash digest
SHA256 2cfd4732f83b2a45a907dc843c49ab79227cc9cdba2cfe0a8d9b7aae927a2a6e
MD5 8d42e0c6fe6a2f6b52dca78f33a2bfac
BLAKE2b-256 f0bcbff0f63764775970aab79add3639b731b19c3792613df2ef7896471ec0cf

See more details on using hashes here.

File details

Details for the file scrapyrt-0.18.0-py3-none-any.whl.

File metadata

  • Download URL: scrapyrt-0.18.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapyrt-0.18.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1a1b7d0ed96c9b1c611b77211b3a6843a65e9f138f04a155f4260a27f67faa4
MD5 bcca5d1f86472ad4666f0192c087d527
BLAKE2b-256 2d9340a1999de09f0f16de9fa336d1b776c3bee2c769a74a5cd2b416664ee1a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page