Skip to main content

Async framework for building modular and scalable web scrapers.

Project description

aioscraper

aioscraper logo

Python License: MIT PyPI Tests Documentation Status GitHub last commit

Asynchronous framework for building modular and scalable web scrapers.

Beta notice: APIs and behavior may change; expect sharp edges while things settle.

Table of Contents

Key Features

  • Async-first core with pluggable HTTP backends (aiohttp/httpx) and aiojobs scheduling
  • Declarative flow: requests → callbacks → pipelines, with middleware hooks at each stage
  • Priority queueing plus configurable concurrency limits per group
  • Adaptive rate limiting with EWMA + AIMD algorithm - automatically backs off on server overload
  • Small, explicit API that is easy to test and compose with existing async applications

Installation

Choose your HTTP backend:

# Option 1: Use aiohttp (recommended for most cases)
pip install "aioscraper[aiohttp]"

# Option 2: Use httpx (if you prefer httpx ecosystem)
pip install "aioscraper[httpx]"

# Option 3: Install both backends for flexibility
pip install "aioscraper[aiohttp,httpx]"

Quick Start

Create scraper.py:

from aioscraper import AIOScraper, Request, Response, SendRequest

scraper = AIOScraper()

@scraper
async def scrape(send_request: SendRequest):
    await send_request(Request(url="https://example.com", callback=handle_response))


async def handle_response(response: Response):
    print(f"Fetched {response.url} with status {response.status}")

Run it:

aioscraper scraper

What's happening?

  1. @scraper decorator registers the scrape() function as the entry point
  2. send_request() schedules a request with a callback; requests are queued and executed with rate limiting
  3. callback=handle_response is called when the response arrives; you can parse data, send new requests, or push items to pipelines
  4. The aioscraper command finds scraper.py, starts the async runtime, and runs your scraper

Examples

See the examples/ directory for fully commented code demonstrating.

Why aioscraper?

  • Scrapy is mature but tied to Twisted and a heavier, older stack. aioscraper is plain asyncio with modern typing and explicit control flow.
  • Less magic: declarative Request → callback → pipeline without opaque spider classes; each piece is a normal function or typed class, simple to test and mock.
  • Light footprint: pluggable HTTP backend (aiohttp/httpx), no global settings or hidden state, no vendor lock-in.
  • Built for modern workloads: high-volume API/JSON crawling, fanning out to microservice endpoints, quick data collection jobs where you want async throughput without a large framework.
  • Easy to embed: runs inside existing async apps (FastAPI, workers, cron jobs) without adapting to a separate runtime.

Use Cases

  • High-volume API/JSON crawling - Collecting data from many REST endpoints concurrently with automatic rate limiting
  • Microservice integration - Fan-out calls inside async apps to hydrate/cache data from external services
  • Lightweight scraping jobs - Quick data collection tasks without heavy framework overhead
  • Embedded scraping - Runs inside existing async apps (FastAPI, workers, cron jobs) without separate runtime

Performance: Benchmarks show stable throughput across CPython 3.11–3.14 (see benchmarks)

Documentation

Full documentation at aioscraper.readthedocs.io

Changelog

See CHANGELOG.md for version history and release notes.

Contributing

Please see the Contributing guide for workflow, tooling, and review expectations.

License

MIT License

Copyright (c) 2025 darkstussy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aioscraper-0.10.0.tar.gz (47.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aioscraper-0.10.0-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

File details

Details for the file aioscraper-0.10.0.tar.gz.

File metadata

  • Download URL: aioscraper-0.10.0.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aioscraper-0.10.0.tar.gz
Algorithm Hash digest
SHA256 c1d1090ef6f4ecd87dfc68f0986953399da1d6f56aaa2deb7166692101e77f4a
MD5 7b3b18b7b14e093dbdaba9c2933ec0ee
BLAKE2b-256 eb2afc4116f44125b9d5146250687bffe8d9d9bcb840bbd8113dff8c3e2fdf0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for aioscraper-0.10.0.tar.gz:

Publisher: release.yml on DarkStussy/aioscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aioscraper-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: aioscraper-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aioscraper-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5754a81744f0ffc98c51dce67daf4228abf22e7b7c51e9a37d3e723af17652e
MD5 754221c369221274d974f6677422faab
BLAKE2b-256 24a43985998b19b7ee74708dda84f9f624383bb278aa3a72865eea223c009d6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for aioscraper-0.10.0-py3-none-any.whl:

Publisher: release.yml on DarkStussy/aioscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page