Skip to main content

A simple, fast, and robust async S3/HTTP downloader with parallel range requests.

Project description

S3impleClient

A simple, fast, and robust async S3/HTTP downloader and uploader with pipelined parallel transfers.

Features

  • Pipelined Parallel I/O: Download/upload large chunks while writing/reading the previous one
  • Two-Level Chunking: Large chunks for disk I/O, small chunks for network requests
  • Async/Sync Support: Use in both async and synchronous contexts
  • HuggingFace Hub Integration: Patch huggingface_hub for faster model downloads/uploads
  • Progress Tracking: Built-in tqdm progress bars with [S3C] prefix
  • Configurable Logging: Debug upload/download operations with configure_logging()
  • Automatic Fallback: Falls back to single-stream for servers without range support
  • Retry Logic: Exponential backoff retry for failed chunks

Installation

pip install s3impleclient

Quick Start

Download

import s3impleclient as s3c

# Synchronous download
result = s3c.download(
    url="https://example.com/large-file.bin",
    dest="./downloads/file.bin",
)

if result.success:
    print(f"Downloaded {result.total_bytes:,} bytes")

Upload (Multipart)

import s3impleclient as s3c

# Upload with pre-signed multipart URLs (from S3 or similar)
result = s3c.upload(
    file_path="./large-file.bin",
    part_urls=["https://s3.../part1", "https://s3.../part2", ...],
    chunk_size=64 * 1024 * 1024,  # 64MB per part (from server)
    completion_url="https://s3.../complete",  # optional
)

if result.success:
    print(f"Uploaded {result.total_bytes:,} bytes in {len(result.parts)} parts")

HuggingFace Hub Integration

import logging
import s3impleclient as s3c
from huggingface_hub import hf_hub_download, upload_folder

# Enable logging to see transfer details
s3c.configure_logging(logging.INFO)

# Patch both download and upload
s3c.patch_all()

# Downloads now use S3impleClient (look for [S3C] in progress bar)
path = hf_hub_download(
    repo_id="username/model",
    filename="model.safetensors",
)

# Uploads also use parallel multipart
upload_folder(
    folder_path="./my-model",
    repo_id="username/model",
)

# Restore original behavior
s3c.unpatch_all()

CLI Usage

# Download
s3c download https://example.com/file.bin
s3c download https://example.com/file.bin -o ./myfile.bin
s3c download https://example.com/file.bin -w 16 -c 20  # workers, chunk MB

# Upload (requires pre-signed URLs in JSON file)
s3c upload ./file.bin --url https://s3.../upload  # single part
s3c upload ./file.bin --part-urls parts.json --chunk-size 67108864  # multipart

How It Works

Download Pipeline

S3impleClient uses a pipelined approach for maximum throughput:

Time ->
┌─────────────────────────────────────────────────────────────┐
│ Download Large Chunk 0 (parallel HTTP range requests)       │
│                        │ Write Chunk 0 │ Download Chunk 1   │
│                                        │ Write 1 │ Download │
│                                                  │ Write... │
└─────────────────────────────────────────────────────────────┘

Two-level chunking:

  • Large chunks (128MB default): Units for disk writes - fits in memory, efficient I/O
  • Small chunks (4MB default): Units for HTTP range requests - parallel within large chunk
Large Chunk 0 (128MB)
├── HTTP Range 0-4MB      ─┐
├── HTTP Range 4-8MB       │
├── HTTP Range 8-12MB      ├── Parallel (8 workers)
├── ...                    │
└── HTTP Range 124-128MB  ─┘
         │
         ▼
    Write to disk (while downloading next large chunk)

Upload Pipeline

Similar pipelining for uploads with prefetch:

Time ->
┌─────────────────────────────────────────────────────────────┐
│ Read Large Chunk 0 (32 parts)                               │
│                          │ Upload Parts 0-7   (parallel)    │
│                          │ Upload Parts 8-15  (parallel)    │
│                          │ Upload Parts 16-23 (parallel)    │
│                          │ Upload Parts 24-31 │ Read Chunk 1│
│                                               │ Upload...   │
└─────────────────────────────────────────────────────────────┘

Upload chunking:

  • Large chunk: max_workers_per_file * prefetch_factor * part_size bytes read at once
  • Part size: Defined by server (e.g., 64MB for HuggingFace)
  • Parallel uploads: Limited by max_workers_per_file semaphore

With defaults (8 workers, 4 prefetch, 64MB parts):

  • Large chunk = 8 * 4 * 64MB = 2GB read into memory
  • 8 parts upload in parallel at any time
  • While uploading, next 2GB is being read

Configuration

Download Config

import s3impleclient as s3c

s3c.configure_download(s3c.DownloadConfig(
    chunk_size=4 * 1024 * 1024,         # 4MB per HTTP request
    write_chunk_size=128 * 1024 * 1024, # 128MB per disk write
    max_workers=8,                       # Parallel HTTP requests
    timeout=30.0,
    max_retries=5,
))

Upload Config

s3c.configure_upload(s3c.UploadConfig(
    max_workers_per_file=8,   # Parallel uploads per file
    max_file_concurrency=4,   # Parallel files (for multi-file upload)
    prefetch_factor=4,        # Read 8*4=32 parts at once
    timeout=60.0,
    max_retries=5,
))

Logging

import logging
import s3impleclient as s3c

# See upload/download configuration
s3c.configure_logging(logging.INFO)

# See per-chunk progress details
s3c.configure_logging(logging.DEBUG)

API Reference

Download

Function Description
download(url, dest, ...) Sync download to file
download_async(url, dest, ...) Async download to file
configure_download(config) Set default download config
Downloader(config) Create custom downloader instance

Upload

Function Description
upload(file_path, ...) Sync upload single file
upload_async(file_path, ...) Async upload single file
upload_files(files, ...) Sync upload multiple files
upload_files_async(files, ...) Async upload multiple files
configure_upload(config) Set default upload config
Uploader(config) Create custom uploader instance

HuggingFace Patching

Function Description
patch_huggingface_hub(config) Patch downloads only
patch_huggingface_hub_upload(config) Patch uploads only
patch_all(dl_config, ul_config) Patch both
unpatch_huggingface_hub() Restore original download
unpatch_huggingface_hub_upload() Restore original upload
unpatch_all() Restore both
is_patched() Check download patch status
is_upload_patched() Check upload patch status

Logging

Function Description
configure_logging(level) Set logging level (default: WARNING)

Documentation

See the docs/ directory for detailed documentation:

Concepts

Implementation

Examples

See the examples/ directory:

  • basic_download.py - Sync and async download usage
  • huggingface_download.py - HuggingFace Hub download integration
  • huggingface_patch.py - Patching details
  • progress_callback.py - Custom progress tracking
  • huggingface_upload.py - HuggingFace Hub upload integration

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3impleclient-0.0.1-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file s3impleclient-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: s3impleclient-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for s3impleclient-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c669dd10668047e56c1eda4ce9a681ede0e2a7f7dd5d74cb7c942d45915a606a
MD5 c860c024b6384f300b574141d61c56c2
BLAKE2b-256 e2531a73e9ececa367531b221c5fce95ca680060d4b40b2660bf079bedfb65fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page