Skip to main content

A synchronous wrapper for AIOHTTP

Project description

TinyRetriever: HTTP Requests Made Easy

PyPi Conda Version CodeCov Python Versions

Downloads CodeFactor Ruff pre-commit Binder

TinyRetriever is a lightweight synchronous wrapper for AIOHTTP that abstracts away the complexities of making asynchronous HTTP requests. It is designed to be simple, easy to use, and efficient. TinyRetriever is built on top of AIOHTTP and AIOFiles, which are popular asynchronous HTTP client and file management libraries for Python.

📚 Full documentation is available here.

Features

TinyRetriever provides the following features:

  • Concurrent Downloads: Efficiently download multiple files simultaneously
  • Flexible Response Types: Get responses as text, JSON, or binary data
  • Rate Limiting: Built-in per-host connection limiting to respect server constraints
  • Streaming Support: Stream large files efficiently with customizable chunk sizes
  • Unique Filenames: Generate unique filenames based on query parameters
  • Works in Jupyter Notebooks: Easily use TinyRetriever in Jupyter notebooks without any additional setup or dependencies
  • Automatic Retries: Exponential backoff with jitter for transient errors (5xx, DNS failures, timeouts)
  • Robust Error Handling: Optional status raising and comprehensive error messages
  • Performance Optimized: Uses orjson when available for faster JSON serialization

TinyRetriever does not use nest-asyncio, instead it creates and manages a dedicated thread for running the event loop. This allows you to use TinyRetriever in Jupyter notebooks and other environments where the event loop is already running.

There are four main functions in TinyRetriever:

  • download: Download files concurrently;
  • check_downloads: Validate existing downloaded files against remote file sizes;
  • fetch: Fetch queries concurrently and return responses as text, JSON, or binary;
  • unique_filename: Generate unique filenames based on query parameters.

Installation

Choose your preferred installation method:

Using pip

pip install tiny-retriever

Using micromamba

micromamba install -c conda-forge tiny-retriever

Alternatively, you can use conda or mamba.

Quick Start Guide

Please refer to the documentation for detailed usage instructions and more elaborate examples.

Downloading Files

from pathlib import Path
import tiny_retriever as terry

urls = ["https://example.com/file1.pdf", "https://example.com/file2.pdf"]
paths = [Path("downloads/file1.pdf"), Path("downloads/file2.pdf")]
# or generate unique filenames
paths = (terry.unique_filename(u) for u in urls)
paths = [Path("downloads", p) for p in paths]

# Download files concurrently
terry.download(urls, paths)

Fetching Data

urls = ["https://api.example.com/data1", "https://api.example.com/data2"]

# Get JSON responses
json_responses = terry.fetch(urls, "json")

# Get text responses
text_responses = terry.fetch(urls, "text")

# Get binary responses
binary_responses = terry.fetch(urls, "binary")

Validating Downloads

# Check if previously downloaded files match remote sizes
invalid = terry.check_downloads(urls, paths)
if invalid:
    for path, expected_size in invalid.items():
        print(f"{path}: local={path.stat().st_size}, expected={expected_size}")
else:
    print("All files are valid!")

Generate Unique Filenames

url = "https://api.example.com/data"
params = {"key": "value"}

# Generate unique filename based on URL and parameters
filename = terry.unique_filename(url, params=params, file_extension=".json")

Advanced Usage

Custom Request Parameters

Note that you can also pass a single url and a dictionary of request parameters to the fetch function. The default network related parameters are conservative and can be modified as needed.

urls = "https://api.example.com/data"
kwargs = {"headers": {"Authorization": "Bearer token"}}

responses = terry.fetch(
    urls,
    return_type="json",
    request_method="post",
    request_kwargs=kwargs,
    limit_per_host=2,
    timeout=30,
)

Error Handling

from tiny_retriever import fetch, ServiceError

try:
    responses = fetch(urls, return_type="json", raise_status=True)
except ServiceError as e:
    print(f"Request failed: {e}")

Retry Configuration

All functions retry transient errors (5xx, DNS failures, timeouts) automatically with exponential backoff and jitter. You can control the number of attempts:

# Retry up to 5 times on transient errors
terry.fetch(urls, "json", retries=5)

# Disable retries
terry.download(urls, paths, retries=1)

Configuration

TinyRetriever can be configured through environment variables and function parameters:

  • MAX_CONCURRENT_CALLS: Maximum number of concurrent requests (default: 10)
  • Default chunk size for downloads: 1 MB
  • Default timeout: 2 minutes for fetch, 10 minutes for download
  • Default connections per host: 4
  • Default retry attempts: 3

Contributing

We welcome contributions! Please see the contributing section for guidelines and instructions.

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_retriever-0.3.0.tar.gz (513.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiny_retriever-0.3.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file tiny_retriever-0.3.0.tar.gz.

File metadata

  • Download URL: tiny_retriever-0.3.0.tar.gz
  • Upload date:
  • Size: 513.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tiny_retriever-0.3.0.tar.gz
Algorithm Hash digest
SHA256 27fc3e11796ad696ecb7153f96ebeda5ad29d57147ba9fdadad9d53fca3e2d1c
MD5 2607b66a097edb59fa89b10e6ebd36ab
BLAKE2b-256 d46b63395da2cd9573ed22934dde78ee3cbba25c44dd9d8e0224d0bdc4b39931

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiny_retriever-0.3.0.tar.gz:

Publisher: release.yml on cheginit/tiny-retriever

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiny_retriever-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: tiny_retriever-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tiny_retriever-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08d67645c414603670574ec05ac84809b8ebabc200f68e70ca65793a606653b5
MD5 06ec8e93edb727c0d339bf378710d314
BLAKE2b-256 b6e9b3190db579dabbc8af02c81f5de5cd81e17725a9053994c94b665563e198

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiny_retriever-0.3.0-py3-none-any.whl:

Publisher: release.yml on cheginit/tiny-retriever

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page