Skip to main content

Asynchronous HTTP API for Running Scrapy Spiders

Project description

ScrapyRTA

PyPI Downloads

ScrapyRTA is an asynchronous HTTP API for running Scrapy spiders, built with FastAPI. It's a modern rewrite of the legacy ScrapyRT project, focusing on asynchronous operation and scalability.

Features

  • Run Scrapy spiders via an Async HTTP API
  • Configurable request parameters

API Endpoints

POST /crawl.json

Run a spider with specified parameters.

Example request:

curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl -v

Have a look at http://127.0.0.1:9080/docs for more details and examples.

You can also create an .env file with the following content to alter ScrapyRTA behavior:

SCRAPYRTA_DEBUG=False
SCRAPYRTA_LOG_LEVEL=INFO
SCRAPYRTA_ENABLE_OPEN_API=False

SCRAPYRTA_TIMEOUT_LIMIT=30 # seconds

Notes

  • Requires scrapy.cfg in project directory, raises error if missing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrta-1.0.4.tar.gz (101.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapyrta-1.0.4-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file scrapyrta-1.0.4.tar.gz.

File metadata

  • Download URL: scrapyrta-1.0.4.tar.gz
  • Upload date:
  • Size: 101.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for scrapyrta-1.0.4.tar.gz
Algorithm Hash digest
SHA256 05fb1aeeba48246b1b285d02c6254b1b90220dda3f635e812899aea5d61daaa2
MD5 47676331442e0c295510aa17da5185f3
BLAKE2b-256 6cc1758df05a8a4bc79df5fe32ecc2dcf926ded040727b4a960433ad78a307ce

See more details on using hashes here.

File details

Details for the file scrapyrta-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: scrapyrta-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for scrapyrta-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b5c02d4ced556c681ad20bb612d78dfb0e58c9b0fd1bec338ea17546b94f2fb5
MD5 2b7dfbf01f82d0d91a4aa86812ef4da9
BLAKE2b-256 14ff9457b92ccfe26d88796f64982d56f0d53407b3c264e391376918fec52c26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page