Skip to main content

Put Scrapy spiders behind an HTTP API

Project description

https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif

ScrapyRT (Scrapy realtime)

https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg https://img.shields.io/pypi/pyversions/scrapyrt.svg https://img.shields.io/pypi/v/scrapyrt.svg https://img.shields.io/pypi/l/scrapyrt.svg Downloads count https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Add HTTP API for your Scrapy project in minutes.

You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider visiting this URL.

  • All Scrapy project components (e.g. middleware, pipelines, extensions) are supported

  • You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.

Quickstart

1. install

> pip install scrapyrt

2. switch to Scrapy project (e.g. quotesbot project)

> cd my/project_path/is/quotesbot

3. launch ScrapyRT

> scrapyrt

4. run your spiders

> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider

>  curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won’t find one. Note that you need to have all your project requirements installed.

Note

  • Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls

  • Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly

Documentation

Documentation is available on readthedocs.

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with “question” label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrt-0.17.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapyrt-0.17.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file scrapyrt-0.17.0.tar.gz.

File metadata

  • Download URL: scrapyrt-0.17.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapyrt-0.17.0.tar.gz
Algorithm Hash digest
SHA256 514625e2cbac6aabc12975f06136ad8081cf4f3bc0d1625834792c4d54432143
MD5 3213aad3d62bdd49949be493abe584d9
BLAKE2b-256 825204b9445d7f190ba93ea98446d3054c4a90e0f49eaf15457c5e13fe2dba32

See more details on using hashes here.

File details

Details for the file scrapyrt-0.17.0-py3-none-any.whl.

File metadata

  • Download URL: scrapyrt-0.17.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapyrt-0.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b79fd160ff15264c08664f508834746910168c8fb0f2f29f6d3452eb3b96be48
MD5 98e869f37d5b3ff1a6ffa816299f24e3
BLAKE2b-256 78287884049497023f0db2a589817821ba543fe3b7c7fe3157ea9c68ade59820

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page