Resumable, cursor-based, CDN-safe HTTP downloads for Python
Project description
pyhaul
Resumable HTTP downloads for Python. Bring your own client: pyhaul borrows your existing session and handles byte-range negotiation, crash-safe checkpointing, and validation.
pip install pyhaul[httpx] # or: niquests, requests, urllib3, aiohttp
import httpx
from pathlib import Path
from pyhaul import haul, PartialHaulError
dest = Path("big.zip")
with httpx.Client() as client:
for _ in range(10):
try:
result = haul("https://example.com/big.zip", client, dest=dest)
break
except PartialHaulError:
pass # only retryable error; others propagate
print(f"done: {dest.stat().st_size:,} bytes")
What is it?
A small, pure-Python library that makes HTTP downloads resumable.
To download a file, call haul() with a URL, your existing HTTP
client, and a destination path. pyhaul handles byte-range
negotiation for resume, ETag validation, crash-safe
checkpointing, and atomic file completion. Supports both sync and
async across multiple HTTP client libraries.
Each call to haul() upholds these guarantees:
- One
haul()makes one request. You are responsible for retry loops, but retry just means callhaul()again. - The destination file will not exist until download is complete.
There is no state where a partially-written file sits at the final
path. Incomplete data lives in a temporary
.partfile; on completion it is atomically moved into place. - Interrupted downloads resume when possible. Checkpoint state
lives on disk, not in memory. Kill the process, lose the network,
get a 503 — the next
haul()picks up from the last durable byte. Zero re-downloaded data if the resource hasn't changed. - If the remote resource changes, retry will not corrupt. If
the remote file changes between attempts,
pyhauldetects the mismatch via ETag (a server-side fingerprint) and starts over cleanly instead of gluing mismatched halves together. - Your HTTP client is borrowed, not owned.
pyhaulsets per-request headers and returns your session untouched. It never creates, configures, or closes sessions. - Transport errors pass through unwrapped.
httpx.ReadTimeoutstayshttpx.ReadTimeout. You catch the types you already know.
How it fits into your code
One haul() = one HTTP request. It either succeeds and returns
CompleteHaul, or it throws — possibly after saving progress
to a .part file that allows the next call to resume. pyhaul never
creates sessions, connections, or clients. Your HTTP library's native
exceptions propagate through unwrapped, so you can drop haul()
into existing code without changing your error handling. Retries are
your call — a for-loop, tenacity, or nothing. Concurrency limiting
(e.g. asyncio.Semaphore) is also yours — pyhaul downloads one
file per call and doesn't manage parallelism.
def haul(url, client, *, dest, state=None) -> CompleteHaul: ...
async def haul_async(url, client, *, dest, state=None) -> CompleteHaul: ...
state is an optional HaulState bag, updated in-place as bytes
land on disk — works identically in sync and async. See
docs/DESIGN.md for the exception hierarchy, transport
adapters, and download lifecycle.
Documentation
- docs/DESIGN.md — Transport adapters, checkpoint state, and the download lifecycle.
- docs/WHY.md — Silent failure modes in HTTP range/resume, and how pyhaul compares
to
curl,wget, andaria2c. - docs/SPEC.md — Control file and checkpoint format (implementers / compatible tools).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyhaul-0.5.0.tar.gz.
File metadata
- Download URL: pyhaul-0.5.0.tar.gz
- Upload date:
- Size: 42.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
119d1292ef9c31fb80df362a1b0dd804e65365278cfacdb324f9fe0f5c3b7cf2
|
|
| MD5 |
20a1f8c1ea4f9637a51621631c5d045f
|
|
| BLAKE2b-256 |
9b90a0e99e467ba60e1d91f1536f95ebe8cc6023fc0179b39e30df70badccb90
|
Provenance
The following attestation bundles were made for pyhaul-0.5.0.tar.gz:
Publisher:
release.yml on chad-loder/pyhaul
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyhaul-0.5.0.tar.gz -
Subject digest:
119d1292ef9c31fb80df362a1b0dd804e65365278cfacdb324f9fe0f5c3b7cf2 - Sigstore transparency entry: 1396726965
- Sigstore integration time:
-
Permalink:
chad-loder/pyhaul@a35e5c51e2f2a6dbae53ba4a88c926a64d7793ab -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chad-loder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a35e5c51e2f2a6dbae53ba4a88c926a64d7793ab -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyhaul-0.5.0-py3-none-any.whl.
File metadata
- Download URL: pyhaul-0.5.0-py3-none-any.whl
- Upload date:
- Size: 53.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09d253fecb5236ec0617afc2e081fc895e24e7b8e309e41eef797c40a3723d31
|
|
| MD5 |
6811a762e0cbbac8c62bdf1f69eec77a
|
|
| BLAKE2b-256 |
939b43b6a41985395a4f3d2295e3e049c84d28261529d703b8fec38260cd00a5
|
Provenance
The following attestation bundles were made for pyhaul-0.5.0-py3-none-any.whl:
Publisher:
release.yml on chad-loder/pyhaul
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyhaul-0.5.0-py3-none-any.whl -
Subject digest:
09d253fecb5236ec0617afc2e081fc895e24e7b8e309e41eef797c40a3723d31 - Sigstore transparency entry: 1396726972
- Sigstore integration time:
-
Permalink:
chad-loder/pyhaul@a35e5c51e2f2a6dbae53ba4a88c926a64d7793ab -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chad-loder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a35e5c51e2f2a6dbae53ba4a88c926a64d7793ab -
Trigger Event:
push
-
Statement type: