Skip to main content

Resumable, cursor-based, CDN-safe HTTP downloads for Python

Project description

pyhaul

CI codecov PyPI License: MIT Docs

Resumable HTTP downloads for Python. Bring your own client: pyhaul borrows your existing session and handles byte-range negotiation, crash-safe checkpointing, and validation.

httpx niquests aiohttp requests urllib3

pip install pyhaul[httpx]   # or: niquests, requests, urllib3, aiohttp
import httpx
from pathlib import Path
from pyhaul import haul, PartialHaulError

dest = Path("big.zip")
with httpx.Client() as client:
    for _ in range(10):
        try:
            result = haul("https://example.com/big.zip", client, dest=dest)
            break
        except PartialHaulError:
            pass  # only retryable error; others propagate

print(f"done: {dest.stat().st_size:,} bytes")

What is it?

A small, pure-Python library that makes HTTP downloads resumable. To download a file, call haul() with a URL, your existing HTTP client, and a destination path. pyhaul handles byte-range negotiation for resume, ETag validation, crash-safe checkpointing, and atomic file completion. Supports both sync and async across multiple HTTP client libraries.

Each call to haul() upholds these guarantees:

  • One haul() makes one request. You are responsible for retry loops, but retry just means call haul() again.
  • The destination file will not exist until download is complete. There is no state where a partially-written file sits at the final path. Incomplete data lives in a temporary .part file; on completion it is atomically moved into place.
  • Interrupted downloads resume when possible. Checkpoint state lives on disk, not in memory. Kill the process, lose the network, get a 503 — the next haul() picks up from the last durable byte. Zero re-downloaded data if the resource hasn't changed.
  • If the remote resource changes, retry will not corrupt. If the remote file changes between attempts, pyhaul detects the mismatch via ETag (a server-side fingerprint) and starts over cleanly instead of gluing mismatched halves together.
  • Your HTTP client is borrowed, not owned. pyhaul sets per-request headers and returns your session untouched. It never creates, configures, or closes sessions.
  • Transport errors pass through unwrapped. httpx.ReadTimeout stays httpx.ReadTimeout. You catch the types you already know.

How it fits into your code

One haul() = one HTTP request. It either succeeds and returns CompleteHaul, or it throws — possibly after saving progress to a .part file that allows the next call to resume. pyhaul never creates sessions, connections, or clients. Your HTTP library's native exceptions propagate through unwrapped, so you can drop haul() into existing code without changing your error handling. Retries are your call — a for-loop, tenacity, or nothing. Concurrency limiting (e.g. asyncio.Semaphore) is also yours — pyhaul downloads one file per call and doesn't manage parallelism.

def haul(url, client, *, dest, state=None) -> CompleteHaul: ...
async def haul_async(url, client, *, dest, state=None) -> CompleteHaul: ...

state is an optional HaulState bag, updated in-place as bytes land on disk — works identically in sync and async. See docs/DESIGN.md for the exception hierarchy, transport adapters, and download lifecycle.

Documentation

Full documentation →

  • docs/DESIGN.md — Transport adapters, checkpoint state, and the download lifecycle.
  • docs/WHY.md — Silent failure modes in HTTP range/resume, and how pyhaul compares to curl, wget, and aria2c.
  • docs/SPEC.md — Control file and checkpoint format (implementers / compatible tools).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhaul-0.5.0.tar.gz (42.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyhaul-0.5.0-py3-none-any.whl (53.0 kB view details)

Uploaded Python 3

File details

Details for the file pyhaul-0.5.0.tar.gz.

File metadata

  • Download URL: pyhaul-0.5.0.tar.gz
  • Upload date:
  • Size: 42.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pyhaul-0.5.0.tar.gz
Algorithm Hash digest
SHA256 119d1292ef9c31fb80df362a1b0dd804e65365278cfacdb324f9fe0f5c3b7cf2
MD5 20a1f8c1ea4f9637a51621631c5d045f
BLAKE2b-256 9b90a0e99e467ba60e1d91f1536f95ebe8cc6023fc0179b39e30df70badccb90

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyhaul-0.5.0.tar.gz:

Publisher: release.yml on chad-loder/pyhaul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyhaul-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pyhaul-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 53.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pyhaul-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09d253fecb5236ec0617afc2e081fc895e24e7b8e309e41eef797c40a3723d31
MD5 6811a762e0cbbac8c62bdf1f69eec77a
BLAKE2b-256 939b43b6a41985395a4f3d2295e3e049c84d28261529d703b8fec38260cd00a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyhaul-0.5.0-py3-none-any.whl:

Publisher: release.yml on chad-loder/pyhaul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page