Skip to main content

A compact web crawling toolkit

Project description

antyr

antyr

Pipeline-like web crawling

A minimalistic web crawling toolkit for Python

PyPI Python License Docs


This project focuses on core crawling primitives: making HTTP requests, consuming responses as streams, and persisting streamed content with explicit, cancellation-safe lifetimes.

Unlike full-featured frameworks, antyr does not implement an end-to-end scraping pipeline. Parsing, extraction logic, data modeling, retries, scheduling, and storage are left to the caller.

If you want a batteries-included scraping framework, consider Scrapy.

Installation

Install via pip:

pip install antyr

Or using uv:

uv add antyr

Please note that the following packages will be installed alongside antyr:

  • trio – structured concurrency runtime
  • httpx[socks] – HTTP client with SOCKS proxy support
  • stem – Tor control port integration

Quickstart

The examples below show how to fetch a resource and either process its contents as a stream or persist it to disk.

Fetch and process a response as a stream

Instead of buffering the entire response in memory, the response can be processed incrementally as it is received.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        stream = await crawler.fetch("/json").content_stream()

        async for chunk in stream:
            # process chunk

trio.run(main)

If the response body is an archive, it can be extracted before processing by calling extract(). The extracted content is exposed through the same streaming interface.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://example.com") as crawler:
        stream = await crawler.fetch("/archive.zip").extract()

        async for chunk in stream:
            # process chunk

trio.run(main)

Stream to disk

Stream the response body directly to disk.

import trio
from antyr import HttpCrawler

async def main() -> None:
    async with HttpCrawler("https://httpbin.org") as crawler:
        await crawler.fetch("/image/png").save("downloads")

trio.run(main)

The target filename is derived from the response headers or URL and normalized for filesystem safety.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antyr-1.0.5.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antyr-1.0.5-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file antyr-1.0.5.tar.gz.

File metadata

  • Download URL: antyr-1.0.5.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for antyr-1.0.5.tar.gz
Algorithm Hash digest
SHA256 7b8b0acad277c925ad9c57ae13beff6c5ff654893a6318dbdafe2fffa9977b0c
MD5 23e2656c364d9b0a8727d9de49f836fc
BLAKE2b-256 f43012dbe34edcb7997583b608b749f56e81ebe8a7d4a267d4f968da35e75c42

See more details on using hashes here.

File details

Details for the file antyr-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: antyr-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for antyr-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f476dc88326a30b6ecd31539c09ad36254b59f544fb4e4504092df7bddc59da3
MD5 7bd7871b80e25d29503adc9d7b6e593d
BLAKE2b-256 b1954e25dd9b4498b3e1c23f8ec3969f08e4ef27c78439d7698013512e043a06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page