A simple, fast, and robust async S3/HTTP downloader with parallel range requests.
Project description
S3impleClient
A simple, fast, and robust async S3/HTTP downloader and uploader with pipelined parallel transfers.
Features
- Pipelined Parallel I/O: Download/upload large chunks while writing/reading the previous one
- Two-Level Chunking: Large chunks for disk I/O, small chunks for network requests
- Async/Sync Support: Use in both async and synchronous contexts
- HuggingFace Hub Integration: Patch
huggingface_hubfor faster model downloads/uploads - Progress Tracking: Built-in tqdm progress bars with
[S3C]prefix - Configurable Logging: Debug upload/download operations with
configure_logging() - Automatic Fallback: Falls back to single-stream for servers without range support
- Retry Logic: Exponential backoff retry for failed chunks
Installation
pip install s3impleclient
Quick Start
Download
import s3impleclient as s3c
# Synchronous download
result = s3c.download(
url="https://example.com/large-file.bin",
dest="./downloads/file.bin",
)
if result.success:
print(f"Downloaded {result.total_bytes:,} bytes")
Upload (Multipart)
import s3impleclient as s3c
# Upload with pre-signed multipart URLs (from S3 or similar)
result = s3c.upload(
file_path="./large-file.bin",
part_urls=["https://s3.../part1", "https://s3.../part2", ...],
chunk_size=64 * 1024 * 1024, # 64MB per part (from server)
completion_url="https://s3.../complete", # optional
)
if result.success:
print(f"Uploaded {result.total_bytes:,} bytes in {len(result.parts)} parts")
HuggingFace Hub Integration
import logging
import s3impleclient as s3c
from huggingface_hub import hf_hub_download, upload_folder
# Enable logging to see transfer details
s3c.configure_logging(logging.INFO)
# Patch both download and upload
s3c.patch_all()
# Downloads now use S3impleClient (look for [S3C] in progress bar)
path = hf_hub_download(
repo_id="username/model",
filename="model.safetensors",
)
# Uploads also use parallel multipart
upload_folder(
folder_path="./my-model",
repo_id="username/model",
)
# Restore original behavior
s3c.unpatch_all()
CLI Usage
# Download
s3c download https://example.com/file.bin
s3c download https://example.com/file.bin -o ./myfile.bin
s3c download https://example.com/file.bin -w 16 -c 20 # workers, chunk MB
# Upload (requires pre-signed URLs in JSON file)
s3c upload ./file.bin --url https://s3.../upload # single part
s3c upload ./file.bin --part-urls parts.json --chunk-size 67108864 # multipart
How It Works
Download Pipeline
S3impleClient uses a pipelined approach for maximum throughput:
Time ->
┌─────────────────────────────────────────────────────────────┐
│ Download Large Chunk 0 (parallel HTTP range requests) │
│ │ Write Chunk 0 │ Download Chunk 1 │
│ │ Write 1 │ Download │
│ │ Write... │
└─────────────────────────────────────────────────────────────┘
Two-level chunking:
- Large chunks (128MB default): Units for disk writes - fits in memory, efficient I/O
- Small chunks (4MB default): Units for HTTP range requests - parallel within large chunk
Large Chunk 0 (128MB)
├── HTTP Range 0-4MB ─┐
├── HTTP Range 4-8MB │
├── HTTP Range 8-12MB ├── Parallel (8 workers)
├── ... │
└── HTTP Range 124-128MB ─┘
│
▼
Write to disk (while downloading next large chunk)
Upload Pipeline
Similar pipelining for uploads with prefetch:
Time ->
┌─────────────────────────────────────────────────────────────┐
│ Read Large Chunk 0 (32 parts) │
│ │ Upload Parts 0-7 (parallel) │
│ │ Upload Parts 8-15 (parallel) │
│ │ Upload Parts 16-23 (parallel) │
│ │ Upload Parts 24-31 │ Read Chunk 1│
│ │ Upload... │
└─────────────────────────────────────────────────────────────┘
Upload chunking:
- Large chunk:
max_workers_per_file * prefetch_factor * part_sizebytes read at once - Part size: Defined by server (e.g., 64MB for HuggingFace)
- Parallel uploads: Limited by
max_workers_per_filesemaphore
With defaults (8 workers, 4 prefetch, 64MB parts):
- Large chunk = 8 * 4 * 64MB = 2GB read into memory
- 8 parts upload in parallel at any time
- While uploading, next 2GB is being read
Configuration
Download Config
import s3impleclient as s3c
s3c.configure_download(s3c.DownloadConfig(
chunk_size=4 * 1024 * 1024, # 4MB per HTTP request
write_chunk_size=128 * 1024 * 1024, # 128MB per disk write
max_workers=8, # Parallel HTTP requests
timeout=30.0,
max_retries=5,
))
Upload Config
s3c.configure_upload(s3c.UploadConfig(
max_workers_per_file=8, # Parallel uploads per file
max_file_concurrency=4, # Parallel files (for multi-file upload)
prefetch_factor=4, # Read 8*4=32 parts at once
timeout=60.0,
max_retries=5,
))
Logging
import logging
import s3impleclient as s3c
# See upload/download configuration
s3c.configure_logging(logging.INFO)
# See per-chunk progress details
s3c.configure_logging(logging.DEBUG)
API Reference
Download
| Function | Description |
|---|---|
download(url, dest, ...) |
Sync download to file |
download_async(url, dest, ...) |
Async download to file |
configure_download(config) |
Set default download config |
Downloader(config) |
Create custom downloader instance |
Upload
| Function | Description |
|---|---|
upload(file_path, ...) |
Sync upload single file |
upload_async(file_path, ...) |
Async upload single file |
upload_files(files, ...) |
Sync upload multiple files |
upload_files_async(files, ...) |
Async upload multiple files |
configure_upload(config) |
Set default upload config |
Uploader(config) |
Create custom uploader instance |
HuggingFace Patching
| Function | Description |
|---|---|
patch_huggingface_hub(config) |
Patch downloads only |
patch_huggingface_hub_upload(config) |
Patch uploads only |
patch_all(dl_config, ul_config) |
Patch both |
unpatch_huggingface_hub() |
Restore original download |
unpatch_huggingface_hub_upload() |
Restore original upload |
unpatch_all() |
Restore both |
is_patched() |
Check download patch status |
is_upload_patched() |
Check upload patch status |
Logging
| Function | Description |
|---|---|
configure_logging(level) |
Set logging level (default: WARNING) |
Documentation
See the docs/ directory for detailed documentation:
Concepts
- Parallel Range Downloads - How parallel downloads work
- Parallel Multipart Uploads - How parallel uploads work
- HuggingFace Hub Download Flow - Download integration details
- HuggingFace Hub Upload Flow - Upload integration details
Implementation
- Architecture - Code structure and design
- API Reference - Full API documentation
Examples
See the examples/ directory:
basic_download.py- Sync and async download usagehuggingface_download.py- HuggingFace Hub download integrationhuggingface_patch.py- Patching detailsprogress_callback.py- Custom progress trackinghuggingface_upload.py- HuggingFace Hub upload integration
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file s3impleclient-0.0.1-py3-none-any.whl.
File metadata
- Download URL: s3impleclient-0.0.1-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c669dd10668047e56c1eda4ce9a681ede0e2a7f7dd5d74cb7c942d45915a606a
|
|
| MD5 |
c860c024b6384f300b574141d61c56c2
|
|
| BLAKE2b-256 |
e2531a73e9ececa367531b221c5fce95ca680060d4b40b2660bf079bedfb65fe
|