Skip to main content

The compact web crawling toolkit

Project description

antyr-logo

antyr

This project focuses on core crawling primitives: making HTTP requests, consuming responses as streams, and persisting streamed content with explicit, cancellation-safe lifetimes.

Unlike full-featured frameworks, antyr does not implement an end-to-end scraping pipeline. Parsing, extraction logic, data modeling, retries, scheduling, and storage are left to the caller.

If you want a batteries-included scraping framework, consider Scrapy.

Installation

Install via pip:

pip install antyr

Or using uv:

uv add antyr

Please note that the following packages will be installed alongside antyr:

  • trio – structured concurrency runtime
  • httpx[socks] – HTTP client with SOCKS proxy support
  • stem – Tor control port integration

Quickstart

The examples below show how to fetch a resource and either process its contents as a stream or persist it to disk.

Fetch and process a response as a stream

Instead of buffering the entire response in memory, the response can be processed incrementally as it is received.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        stream = await crawler.fetch("/json").content_stream()

        async for chunk in stream:
            # process chunk

trio.run(main)

If the response body is an archive, it can be extracted before processing by calling extract(). The extracted content is exposed through the same streaming interface.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://example.com") as crawler:
        stream = await crawler.fetch("/archive.zip").extract()

        async for chunk in stream:
            # process chunk

trio.run(main)

Stream to disk

Stream the response body directly to disk.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        await crawler.fetch("/image/png").save("downloads")

trio.run(main)

The target filename is derived from the response headers or URL and normalized for filesystem safety.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antyr-1.0.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antyr-1.0.1-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file antyr-1.0.1.tar.gz.

File metadata

  • Download URL: antyr-1.0.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for antyr-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7ef9479395a476b1eaaf0c25fa7c5496a952f78d1673cb0f1bdc65f7a6ab826d
MD5 ed5a3b4e9381db0132d98e0d51f25831
BLAKE2b-256 dbd00b965a6297059eb61ebde74d061e03dc521cb67c7cc6e85d054682a09037

See more details on using hashes here.

File details

Details for the file antyr-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: antyr-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for antyr-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 98221270e7d62ed37f9a660da933421bd30554a69eb85aaab257f63810d047b5
MD5 b2ce07f03af48511d71dca93c0813864
BLAKE2b-256 7eab9aaca2ddbfd881887b9e2686731bb29faaf47f0d5b89ae4d6e1996729516

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page