Skip to main content

The compact web crawling toolkit

Project description

antyr-logo

antyr

This project focuses on core crawling primitives: making HTTP requests, consuming responses as streams, and persisting streamed content with explicit, cancellation-safe lifetimes.

Unlike full-featured frameworks, antyr does not implement an end-to-end scraping pipeline. Parsing, extraction logic, data modeling, retries, scheduling, and storage are left to the caller.

If you want a batteries-included scraping framework, consider Scrapy.

Installation

Install via pip:

pip install antyr

Or using uv:

uv add antyr

Please note that the following packages will be installed alongside antyr:

  • trio – structured concurrency runtime
  • httpx[socks] – HTTP client with SOCKS proxy support
  • stem – Tor control port integration

Quickstart

The examples below show how to fetch a resource and either process its contents as a stream or persist it to disk.

Fetch and process a response as a stream

Instead of buffering the entire response in memory, the response can be processed incrementally as it is received.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        stream = await crawler.fetch("/json").content_stream()

        async for chunk in stream:
            # process chunk

trio.run(main)

If the response body is an archive, it can be extracted before processing by calling extract(). The extracted content is exposed through the same streaming interface.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://example.com") as crawler:
        stream = await crawler.fetch("/archive.zip").extract()

        async for chunk in stream:
            # process chunk

trio.run(main)

Stream to disk

Stream the response body directly to disk.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        await crawler.fetch("/image/png").save("downloads")

trio.run(main)

The target filename is derived from the response headers or URL and normalized for filesystem safety.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antyr-1.0.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antyr-1.0.0-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file antyr-1.0.0.tar.gz.

File metadata

  • Download URL: antyr-1.0.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for antyr-1.0.0.tar.gz
Algorithm Hash digest
SHA256 baa021100b4c238af3926320d2b0cf8649a3e2daed33fdaa382d7f5c5c9ceed7
MD5 c6fe246461be193b8a1c0033b6f97122
BLAKE2b-256 b3ed9b260ddb3f2b7b979947a73ae4c18351107c33585b8f135d97d2dc5d1696

See more details on using hashes here.

File details

Details for the file antyr-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: antyr-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for antyr-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a76815cf89f3e5570fa9bce4b16f30d78c1b9443d3c71842e72c909b118c06a
MD5 87b6cf8e5e3446188e2a5277498acb2c
BLAKE2b-256 715a1af37f8d325d71d01a810d9349369e3c734940aa803fa19ac04adebc6d26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page