Skip to main content

Concurrent HTTP downloader with in-memory stream reordering (httpx + uvloop).

Project description

HydraStream

PyPI version Python 3.11+ License: MIT Coverage: 90% Tests Ask DeepWiki

HydraStream Demo

HydraStream is a concurrent HTTP downloader written in Python. It supports multipart downloading and in-memory chunk reordering, allowing you to stream remote files directly to stdout without writing to disk.

Motivation

Standard tools like wget or curl stream sequentially but are limited to a single connection. Tools like aria2 download concurrently but require disk I/O to reassemble the file.

This project bridges the gap: it fetches chunks concurrently via httpx and uvloop, buffers them in memory using a min-heap, and yields a sequential byte stream. This is useful for piping large remote files (e.g., genomics data, DB dumps) directly into Unix tools (zcat, grep, tar) when local disk space is constrained.

Features

  • Concurrent Downloading: Uses HTTP Range requests to fetch parts simultaneously.
  • Stream Reordering: Converts out-of-order chunks into a sequential stream via an internal priority queue.
  • Rate Limiting & Backoff: AIMD-based rate limiter to handle 429 Too Many Requests and exponential backoff for network drops.
  • Resumption: Saves partial state for disk-mode downloads to resume after interruptions.
  • POSIX Compliance: In stream mode or --quiet mode, logs are routed to stderr and data to stdout.

Installation

Requires Python 3.11+.

uv tool install git+https://github.com/Zhukovetski/HydraStream.git

or

pipx install git+https://github.com/Zhukovetski/HydraStream.git

Usage

1. Download to Disk

Download a file using 20 connections:

hs "https://ftp.ncbi.nlm.nih.gov/.../genome.fna.gz" -t 20 --output ./data

HydraStream Demo

2. Stream to stdout (Pipe)

Download concurrently and pipe directly into a decompressor:

hs "https://ftp.ncbi.nlm.nih.gov/.../genome.fna.gz" -t 20 --stream -q | zcat | wc -l

Pipeline Streaming Demo

3. Python API

import asyncio
from hydrastream import HydraClient

async def main():
    urls =["https://example.com/file1.gz"]
    async with HydraClient(threads=10, quiet=True) as client:
        async for filename, stream in client.stream(urls):
            async for chunk in stream:
                pass # Process chunk bytes

if __name__ == "__main__":
    asyncio.run(main())

CLI Options

Option Shortcut Default Description
URLS - Required One or multiple URLs to download.
--threads -t 1 Number of concurrent connections.
--output -o download/ Output directory.
--stream -s False Enable streaming mode (redirects data to stdout).
--no-ui -nu False Disables progress bars, leaves plain text logs.
--quiet -q False Silence console output. Logs are still written to file.
--md5 None Expected MD5 hash (single URL only).
--buffer -b threads * 10MB Maximum stream buffer size in bytes.

Roadmap

  • v1.2: Autonomous Worker Scaling: Transition from a static thread pool to adaptive concurrency based on network conditions and downstream backpressure.
  • v2.0: Rust Core: Port the core engine to Rust (tokio/reqwest) with a PyO3 wrapper to bypass the Python GIL and improve multi-core execution.

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydrastream-1.1.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hydrastream-1.1.0-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file hydrastream-1.1.0.tar.gz.

File metadata

  • Download URL: hydrastream-1.1.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.3","id":"zena","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hydrastream-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ffee2c1aed560b716e93d0d2e6ccc95eefe7570ab7fb87583d07d251351c1243
MD5 5e1a8601d21f287b05d07049814cf636
BLAKE2b-256 cf0f7d69a2c18ccf1e5e177acbe622f69cec11162ef3e7b8f1c12d984026d705

See more details on using hashes here.

File details

Details for the file hydrastream-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: hydrastream-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.3","id":"zena","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hydrastream-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 531e1a0419e7db0804fac3b10e21915af0c32f92583d21b68bb306508fdbe50e
MD5 316b1cd0e4c5a76d3fc9971e0f6dbc76
BLAKE2b-256 3197475a92ac0ff375dcf2248792453ca23be933647bacd105c1db4a1183a434

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page