Skip to main content

Async framework for building modular and scalable web scrapers.

Project description

aioscraper

aioscraper logo

Python GitHub License PyPI - Version PyPI - Downloads GitHub Actions Workflow Status Read the Docs GitHub last commit

Asynchronous framework for building modular and scalable web scrapers.

Beta notice: APIs and behavior may change; expect sharp edges while things settle.

Table of Contents

Key Features

  • Async-first core with pluggable HTTP backends (aiohttp/httpx) and aiojobs scheduling
  • Declarative flow: requests → callbacks → pipelines, with middleware hooks at each stage
  • Priority queueing plus configurable concurrency limits per group
  • Adaptive rate limiting with EWMA + AIMD algorithm - automatically backs off on server overload
  • Small, explicit API that is easy to test and compose with existing async applications

Installation

Choose your HTTP backend:

# Option 1: Use aiohttp (recommended for most cases)
pip install "aioscraper[aiohttp]"

# Option 2: Use httpx (if you prefer httpx ecosystem)
pip install "aioscraper[httpx]"

# Option 3: Install both backends for flexibility
pip install "aioscraper[aiohttp,httpx]"

Quick Start

Create scraper.py:

from aioscraper import AIOScraper, Request, Response, SendRequest

scraper = AIOScraper()

@scraper
async def scrape(send_request: SendRequest):
    await send_request(Request(url="https://example.com", callback=handle_response))


async def handle_response(response: Response):
    print(f"Fetched {response.url} with status {response.status}")

Run it:

aioscraper scraper

What's happening?

  1. @scraper decorator registers the scrape() function as the entry point
  2. send_request() schedules a request with a callback; requests are queued and executed with rate limiting
  3. callback=handle_response is called when the response arrives; you can parse data, send new requests, or push items to pipelines
  4. The aioscraper command finds scraper.py, starts the async runtime, and runs your scraper

Examples

See the examples/ directory for fully commented code demonstrating.

Why aioscraper?

  • Scrapy is mature but tied to Twisted and a heavier, older stack. aioscraper is plain asyncio with modern typing and explicit control flow.
  • Less magic: declarative Request → callback → pipeline without opaque spider classes; each piece is a normal function or typed class, simple to test and mock.
  • Light footprint: pluggable HTTP backend (aiohttp/httpx), no global settings or hidden state, no vendor lock-in.
  • Built for modern workloads: high-volume API/JSON crawling, fanning out to microservice endpoints, quick data collection jobs where you want async throughput without a large framework.
  • Easy to embed: runs inside existing async apps (FastAPI, workers, cron jobs) without adapting to a separate runtime.

Use Cases

  • High-volume API/JSON crawling - Collecting data from many REST endpoints concurrently with automatic rate limiting
  • Microservice integration - Fan-out calls inside async apps to hydrate/cache data from external services
  • Lightweight scraping jobs - Quick data collection tasks without heavy framework overhead
  • Embedded scraping - Runs inside existing async apps (FastAPI, workers, cron jobs) without separate runtime

Performance: Benchmarks show stable throughput across CPython 3.11–3.14 (see benchmarks)

Documentation

Full documentation at aioscraper.readthedocs.io

Changelog

See CHANGELOG.md for version history and release notes.

Contributing

Please see the Contributing guide for workflow, tooling, and review expectations.

License

MIT License

Copyright (c) 2025 darkstussy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aioscraper-0.10.1.tar.gz (52.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aioscraper-0.10.1-py3-none-any.whl (48.3 kB view details)

Uploaded Python 3

File details

Details for the file aioscraper-0.10.1.tar.gz.

File metadata

  • Download URL: aioscraper-0.10.1.tar.gz
  • Upload date:
  • Size: 52.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aioscraper-0.10.1.tar.gz
Algorithm Hash digest
SHA256 99e275ecc4dca5ad612caf8410eb60b013d35cef3b13543ebdc0aa227c942d32
MD5 899be7b5a71d730a004e12232d8cbcaa
BLAKE2b-256 b1b4463e3b49d0ee8dca1953678d038149918169fc2aa78d9201b952d544896e

See more details on using hashes here.

Provenance

The following attestation bundles were made for aioscraper-0.10.1.tar.gz:

Publisher: release.yml on DarkStussy/aioscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aioscraper-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: aioscraper-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 48.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aioscraper-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d931d57c73bf0ea4ad1f7c831996b80546b1f22e556e09883b87ff4d7335cd97
MD5 3f21c7b1a8702b7ef1e3d1614f3b74f8
BLAKE2b-256 41caa06f99fa4004c17ba585301761d76f7540d9875cb996021aaf30e849083a

See more details on using hashes here.

Provenance

The following attestation bundles were made for aioscraper-0.10.1-py3-none-any.whl:

Publisher: release.yml on DarkStussy/aioscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page