Concurrent HTTP downloader with in-memory stream reordering (httpx + uvloop).
Project description
HydraStream
HydraStream is a concurrent HTTP downloader written in Python. It supports multipart downloading and in-memory chunk reordering, allowing you to stream remote files directly to stdout without writing to disk.
Motivation
Standard tools like wget or curl stream sequentially but are limited to a single connection. Tools like aria2 download concurrently but require disk I/O to reassemble the file.
This project bridges the gap: it fetches chunks concurrently via httpx and uvloop, buffers them in memory using a min-heap, and yields a sequential byte stream. This is useful for piping large remote files (e.g., genomics data, DB dumps) directly into Unix tools (zcat, grep, tar) when local disk space is constrained.
Features
- Concurrent Downloading: Uses HTTP Range requests to fetch parts simultaneously.
- Stream Reordering: Converts out-of-order chunks into a sequential stream via an internal priority queue.
- Rate Limiting & Backoff: AIMD-based rate limiter to handle
429 Too Many Requestsand exponential backoff for network drops. - Resumption: Saves partial state for disk-mode downloads to resume after interruptions.
- POSIX Compliance: In stream mode or
--quietmode, logs are routed tostderrand data tostdout.
Installation
Requires Python 3.11+.
uv tool install git+https://github.com/Zhukovetski/HydraStream.git
or
pipx install git+https://github.com/Zhukovetski/HydraStream.git
Usage
1. Download to Disk
Download a file using 20 connections:
hs "https://ftp.ncbi.nlm.nih.gov/.../genome.fna.gz" -t 20 --output ./data
2. Stream to stdout (Pipe)
Download concurrently and pipe directly into a decompressor:
hs "https://ftp.ncbi.nlm.nih.gov/.../genome.fna.gz" -t 20 --stream -q | zcat | wc -l
3. Python API
import asyncio
from hydrastream import HydraClient
async def main():
urls =["https://example.com/file1.gz"]
async with HydraClient(threads=10, quiet=True) as client:
async for filename, stream in client.stream(urls):
async for chunk in stream:
pass # Process chunk bytes
if __name__ == "__main__":
asyncio.run(main())
CLI Options
| Option | Shortcut | Default | Description |
|---|---|---|---|
URLS |
- | Required | One or multiple URLs to download. |
--threads |
-t |
1 |
Number of concurrent connections. |
--output |
-o |
download/ |
Output directory. |
--stream |
-s |
False |
Enable streaming mode (redirects data to stdout). |
--no-ui |
-nu |
False |
Disables progress bars, leaves plain text logs. |
--quiet |
-q |
False |
Silence console output. Logs are still written to file. |
--md5 |
None |
Expected MD5 hash (single URL only). | |
--buffer |
-b |
threads * 10MB |
Maximum stream buffer size in bytes. |
Roadmap
- v1.2: Autonomous Worker Scaling: Transition from a static thread pool to adaptive concurrency based on network conditions and downstream backpressure.
- v2.0: Rust Core: Port the core engine to Rust (
tokio/reqwest) with aPyO3wrapper to bypass the Python GIL and improve multi-core execution.
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hydrastream-1.1.0.tar.gz.
File metadata
- Download URL: hydrastream-1.1.0.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.3","id":"zena","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffee2c1aed560b716e93d0d2e6ccc95eefe7570ab7fb87583d07d251351c1243
|
|
| MD5 |
5e1a8601d21f287b05d07049814cf636
|
|
| BLAKE2b-256 |
cf0f7d69a2c18ccf1e5e177acbe622f69cec11162ef3e7b8f1c12d984026d705
|
File details
Details for the file hydrastream-1.1.0-py3-none-any.whl.
File metadata
- Download URL: hydrastream-1.1.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.3","id":"zena","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
531e1a0419e7db0804fac3b10e21915af0c32f92583d21b68bb306508fdbe50e
|
|
| MD5 |
316b1cd0e4c5a76d3fc9971e0f6dbc76
|
|
| BLAKE2b-256 |
3197475a92ac0ff375dcf2248792453ca23be933647bacd105c1db4a1183a434
|