Skip to main content

A compact web crawling toolkit

Project description

antyr-logo

antyr

This project focuses on core crawling primitives: making HTTP requests, consuming responses as streams, and persisting streamed content with explicit, cancellation-safe lifetimes.

Unlike full-featured frameworks, antyr does not implement an end-to-end scraping pipeline. Parsing, extraction logic, data modeling, retries, scheduling, and storage are left to the caller.

If you want a batteries-included scraping framework, consider Scrapy.

Installation

Install via pip:

pip install antyr

Or using uv:

uv add antyr

Please note that the following packages will be installed alongside antyr:

  • trio – structured concurrency runtime
  • httpx[socks] – HTTP client with SOCKS proxy support
  • stem – Tor control port integration

Quickstart

The examples below show how to fetch a resource and either process its contents as a stream or persist it to disk.

Fetch and process a response as a stream

Instead of buffering the entire response in memory, the response can be processed incrementally as it is received.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        stream = await crawler.fetch("/json").content_stream()

        async for chunk in stream:
            # process chunk

trio.run(main)

If the response body is an archive, it can be extracted before processing by calling extract(). The extracted content is exposed through the same streaming interface.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://example.com") as crawler:
        stream = await crawler.fetch("/archive.zip").extract()

        async for chunk in stream:
            # process chunk

trio.run(main)

Stream to disk

Stream the response body directly to disk.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        await crawler.fetch("/image/png").save("downloads")

trio.run(main)

The target filename is derived from the response headers or URL and normalized for filesystem safety.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antyr-1.0.3.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antyr-1.0.3-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file antyr-1.0.3.tar.gz.

File metadata

  • Download URL: antyr-1.0.3.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for antyr-1.0.3.tar.gz
Algorithm Hash digest
SHA256 edf78fccbb7340c3e043aa2bc1c67bf74d52c278cc35848909c099d4c72981f6
MD5 9b28ffed7f66ca717066bf079246a5f2
BLAKE2b-256 efc09d1aea8810aa9b1e516492e66361043c0f0cab7515e56406e0db29d700af

See more details on using hashes here.

File details

Details for the file antyr-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: antyr-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for antyr-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1309ce6b299d07b7f636bf4941f69c1320e2bad7d892ef2abb2bf465f404d51a
MD5 4865693a472f667a923323bbcb2b197a
BLAKE2b-256 d9a420c134e496150c6ea26109e4eb9b7513bf64924e015f6cca2af0554b43d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page