Skip to main content

A compact web crawling toolkit

Project description

antyr-logo

antyr

This project focuses on core crawling primitives: making HTTP requests, consuming responses as streams, and persisting streamed content with explicit, cancellation-safe lifetimes.

Unlike full-featured frameworks, antyr does not implement an end-to-end scraping pipeline. Parsing, extraction logic, data modeling, retries, scheduling, and storage are left to the caller.

If you want a batteries-included scraping framework, consider Scrapy.

Installation

Install via pip:

pip install antyr

Or using uv:

uv add antyr

Please note that the following packages will be installed alongside antyr:

  • trio – structured concurrency runtime
  • httpx[socks] – HTTP client with SOCKS proxy support
  • stem – Tor control port integration

Quickstart

The examples below show how to fetch a resource and either process its contents as a stream or persist it to disk.

Fetch and process a response as a stream

Instead of buffering the entire response in memory, the response can be processed incrementally as it is received.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        stream = await crawler.fetch("/json").content_stream()

        async for chunk in stream:
            # process chunk

trio.run(main)

If the response body is an archive, it can be extracted before processing by calling extract(). The extracted content is exposed through the same streaming interface.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://example.com") as crawler:
        stream = await crawler.fetch("/archive.zip").extract()

        async for chunk in stream:
            # process chunk

trio.run(main)

Stream to disk

Stream the response body directly to disk.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        await crawler.fetch("/image/png").save("downloads")

trio.run(main)

The target filename is derived from the response headers or URL and normalized for filesystem safety.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antyr-1.0.2.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antyr-1.0.2-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file antyr-1.0.2.tar.gz.

File metadata

  • Download URL: antyr-1.0.2.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for antyr-1.0.2.tar.gz
Algorithm Hash digest
SHA256 094207017e29708c621e55316d4c70cb2e9f69f33c2dbef75daff600792b4cd4
MD5 4923a112251db282f38b190ce83b626e
BLAKE2b-256 506d5f30fed18e7ab58eeba38dc3e5a9e5ea4606a187f21eae62958652b9d465

See more details on using hashes here.

File details

Details for the file antyr-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: antyr-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for antyr-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d8f9ce629923fadd1052f4c65484e663166253afb08b371e8759ba2f52c507a8
MD5 6f647db86e8fa044d29368b4e54a71ee
BLAKE2b-256 bb9780152f0b233a2e3065bcab35a23efcbcf4654fcacc7ab34e201c75ff6e52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page