Skip to main content

Async framework for building modular and scalable web scrapers.

Project description

aioscraper

aioscraper logo

Asynchronous framework for building modular and scalable web scrapers.

Python License: MIT Version

Features

  • Fully asynchronous architecture powered by aiohttp and aiojobs
  • Modular system with middleware support
  • Pipeline data processing
  • Flexible configuration
  • Priority-based request queue management
  • Built-in error handling

📓 Documentation

Basic usage

Install

pip install aioscraper

Example of fetching data.

import asyncio

from aioscraper import AIOScraper
from aioscraper.types import Request, SendRequest, Response


async def scraper(send_request: SendRequest) -> None:
    await send_request(Request(url="https://example.com", callback=handle_response))


async def handle_response(response: Response) -> None:
    print(f"Fetched {response.url} with status {response.status}")


async def main():
    async with AIOScraper(scraper) as s:
        await s.start()


if __name__ == "__main__":
    asyncio.run(main())

Benchmarks

Below are benchmarks comparing aioscraper and scrapy on a local JSON server.

The scripts used for these tests are available in this Gist.

Benchmark 1

  • Path: /json?size=10
  • Total requests: 10,000
  • Total items: 100,000
Library Elapsed time Requests per second Items per second
aioscraper 1.9 sec 5,263.2 52,631.6
scrapy 26.8 sec 373.1 3,731.3

Benchmark 2

  • Path: /json?size=100
  • Total requests: 10,000
  • Total items: 1,000,000
Library Elapsed time Requests per second Items per second
aioscraper 3.1 sec 3,225.8 322,580.6
scrapy 205.8 sec 48.6 4,859.1

Benchmark 3

  • Path: /json?size=10&t=0.1
  • Total requests: 10,000
  • Total items: 100,000
Library Elapsed time Requests per second Items per second
aioscraper 16.1 sec 621.1 6,211.2
scrapy 129.9 sec 77.0 769.8

License

MIT License

Copyright (c) 2025 darkstussy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aioscraper-0.7.1.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aioscraper-0.7.1-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file aioscraper-0.7.1.tar.gz.

File metadata

  • Download URL: aioscraper-0.7.1.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aioscraper-0.7.1.tar.gz
Algorithm Hash digest
SHA256 8f40cc682e1f0a80aadd4437ef0a202206eb5d47603e668cf977883b8682b078
MD5 2e567901bfa116976b5d1e62a4560d5b
BLAKE2b-256 71a88e7d9aeacc3a2681c967e1e40cfb9427924a10b11e854063a8ff139328da

See more details on using hashes here.

Provenance

The following attestation bundles were made for aioscraper-0.7.1.tar.gz:

Publisher: release.yml on DarkStussy/aioscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aioscraper-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: aioscraper-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aioscraper-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 06a35d811e3bb7302ee4cfb062163bf619a8dde1200f566e0a7c2eb237991ae4
MD5 ae18fa67157d02440dd8bf6fc75057bb
BLAKE2b-256 bb46d7e06b50fe90439cf8a2b85b438f297a55256593676265e9b216d7359611

See more details on using hashes here.

Provenance

The following attestation bundles were made for aioscraper-0.7.1-py3-none-any.whl:

Publisher: release.yml on DarkStussy/aioscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page