Skip to main content

Put Scrapy spiders behind an HTTP API

Project description

https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif

ScrapyRT (Scrapy realtime)

https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg https://img.shields.io/pypi/pyversions/scrapyrt.svg https://img.shields.io/pypi/v/scrapyrt.svg https://img.shields.io/pypi/l/scrapyrt.svg Downloads count https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Add HTTP API for your Scrapy project in minutes.

You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider visiting this URL.

  • All Scrapy project components (e.g. middleware, pipelines, extensions) are supported
  • You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.

Quickstart

1. install

> pip install scrapyrt

2. switch to Scrapy project (e.g. quotesbot project)

> cd my/project_path/is/quotesbot

3. launch ScrapyRT

> scrapyrt

4. run your spiders

> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider

>  curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won’t find one. Note that you need to have all your project requirements installed.

Note

  • Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls
  • Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with “question” label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapyrt, version 0.13.0
Filename, size File type Python version Upload date Hashes
Filename, size scrapyrt-0.13.0.tar.gz (29.4 kB) File type Source Python version None Upload date Hashes View
Filename, size scrapyrt-0.13.0-py2.py3-none-any.whl (36.3 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page