Skip to main content

Chunked, resumable, encrypted pipe to S3-compatible object storage

Project description

s3duct

Chunked, resumable, encrypted pipe to S3-compatible object storage.

Stream data from stdin directly to S3 in fixed-size chunks with integrity verification, optional encryption, cache management, and automatic resume on failure.

Features

  • Chunked streaming - pipes stdin to S3 without loading the full file into memory
  • Resumable uploads - interrupted uploads pick up where they left off using a signature-chained resume log
  • Dual-hash integrity - every chunk is verified with both SHA-256 and SHA3-256
  • HMAC signature chain - cryptographic chain across all chunks prevents tampering or reordering
  • Encryption - optional client-side encryption: AES-256-GCM (symmetric) or age (asymmetric, with optional post-quantum keys)
  • Backpressure - disk-aware flow control prevents scratch directory from filling up
  • S3-compatible - works with AWS S3, Cloudflare R2, MinIO, Backblaze B2, Wasabi, and any S3-compatible endpoint via --endpoint-url

Installation

pip install s3duct

For development:

git clone <repo-url>
cd s3duct
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

AWS Credentials

s3duct uses boto3's standard credential chain. It does not handle authentication itself. Credentials are resolved in this order:

  1. Environment variablesAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
  2. Shared credentials file~/.aws/credentials (set up via aws configure)
  3. Config file~/.aws/config (profiles, SSO)
  4. IAM instance role — automatic on EC2, ECS, Lambda
  5. SSO / credential process — configured in ~/.aws/config

For non-AWS S3-compatible services, set the appropriate credentials for that provider and use --endpoint-url.

Quick Start

Upload

# No encryption
tar czf - /my/data | s3duct put --bucket mybucket --name mybackup

# AES-256-GCM encryption (recommended for most users)
tar czf - /my/data | s3duct put --bucket mybucket --name mybackup \
  --key hex:$(openssl rand -hex 32)

# age encryption (asymmetric — see Encryption section below)
tar czf - /my/data | s3duct put --bucket mybucket --name mybackup \
  --age-identity ~/.age/key.txt

# Custom chunk size, storage class, and tags
pg_dump mydb | s3duct put --bucket mybucket --name db-backup \
  --chunk-size 256M --storage-class STANDARD_IA --tag env=prod

Download

# Unencrypted
s3duct get --bucket mybucket --name mybackup > restored.tar.gz

# AES-256-GCM
s3duct get --bucket mybucket --name mybackup --key hex:... | zpool import

# age
s3duct get --bucket mybucket --name mybackup \
  --age-identity ~/.age/key.txt | tar xzf -

List stored streams

s3duct list --bucket mybucket

Verify integrity

s3duct verify --bucket mybucket --name mybackup

Encryption

s3duct supports two encryption methods. Both encrypt each chunk client-side before upload. The manifest records which method was used, so get auto-detects the decryption path.

Manifest encryption is currently only supported for AES-256; age will be supported in a later release.

If the manifest is unencrypted, it exposes metadata only (not plaintext): stream name/prefix, chunk count, per-chunk sizes, hashes, ETags, timestamps, storage class, tags, and the encryption method/recipient. With read access to the bucket, many of these (object keys, sizes, storage class, timestamps, and ETags) are already inferable from object listings and HEADs, but the manifest makes it trivial to enumerate and correlate. This is usually low-risk but can leak file sizes, change cadence, and other patterns — use --encrypt-manifest if that matters.

AES-256-GCM (symmetric)

Simple, fast, quantum-resistant. You manage one 32-byte key.

# Generate a key (save this — you need it to decrypt)
openssl rand -hex 32
# e.g., a1b2c3d4...64 hex chars

# Upload
tar czf - /data | s3duct put --bucket b --name n --key hex:a1b2c3...

# Download
s3duct get --bucket b --name n --key hex:a1b2c3... > data.tar.gz

Key formats:

  • hex:AABBCC... — 64 hex characters (32 bytes)
  • file:/path/to/keyfile — raw 32-byte file
  • env:VAR_NAME — environment variable containing hex key

To also encrypt the manifest (hides chunk count, sizes, timestamps):

s3duct put --bucket b --name n --key hex:... --encrypt-manifest

Each chunk is encrypted with a random 96-bit nonce. A nonce collision under the same key (including cross-session key reuse) becomes probable around 2^48 chunks (the birthday bound), which at the default 512MB chunk size is ~128 exabytes. Use a different key per stream to reset the bound. If you anticipate your use case exceeding the birthday bound within a single stream and reasonable block size, age does not suffer from birthday bound issues. If symmetric is a hard requirement, please open an issue and make a donation to Trans Lifeline. Plus we'd just be interested to hear about how well it hyperscales.

age (asymmetric)

Uses age for public-key encryption. Useful when the encryptor shouldn't hold the decryption key (e.g., automated backups encrypting to an offline recovery key).

age uses X25519 (classic) or ML-KEM-768+X25519 (PQ hybrid) to wrap a per-file key, with HKDF-SHA256 and ChaCha20-Poly1305. Payload encryption also uses HKDF-SHA256 and ChaCha20-Poly1305, with an HMAC-SHA256 header MAC.

# Install age
# macOS: brew install age
# Linux: xbps-install age / apt install age / apk add age / pacman -S age / etc. - see https://age-encryption.org/

# Generate a keypair
age-keygen -pq -o ~/.age/key.txt
# Prints the public key (recipient): age1...

# Upload (uses the keypair — extracts public key automatically)
tar czf - /data | s3duct put --bucket b --name n --age-identity ~/.age/key.txt

# Download (needs the private key)
s3duct get --bucket b --name n --age-identity ~/.age/key.txt > data.tar.gz

Post-quantum keys

age supports hybrid post-quantum keys (X25519 + ML-KEM-768) via the -pq flag. These protect against both classical and future quantum attacks. This is recommended if long-term confidentiality matters (e.g., encrypted archives that may sit in storage for years), and is likely to become the default in a future release (note that -pq was already specified in the previous section - the only downside to using post-quantum keys is an overhead of ~2KB/block).

Comparison

AES-256-GCM age
Quantum resistant Yes Yes, with -pq
Key model Symmetric Asymmetric
External dependency None age CLI
Best for Simple backups Multi-party / offline keys

Storage Classes

# Standard (default)
tar czf - /data | s3duct put --bucket mybucket --name backup --no-encrypt

# Infrequent Access
tar czf - /data | s3duct put --bucket mybucket --name backup --no-encrypt \
  --storage-class STANDARD_IA

# Glacier (retrieval requires thaw — see below)
tar czf - /data | s3duct put --bucket mybucket --name archive --no-encrypt \
  --storage-class GLACIER

# Glacier Deep Archive (cheapest, slowest retrieval)
tar czf - /data | s3duct put --bucket mybucket --name deep-archive --no-encrypt \
  --storage-class DEEP_ARCHIVE

Supported values: STANDARD, REDUCED_REDUNDANCY, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER, DEEP_ARCHIVE, GLACIER_IR.

Note: the manifest is always uploaded as STANDARD so it remains immediately accessible regardless of the chunk storage class.

Glacier/Deep Archive retrieval: chunks stored in Glacier or Deep Archive cannot be downloaded directly. You must first restore (thaw) the objects using the AWS CLI or console, then run s3duct get once the restore completes. Automated thaw support is planned.

S3-compatible endpoints

# Cloudflare R2
s3duct put --bucket mybucket --name backup --no-encrypt \
  --endpoint-url https://<account-id>.r2.cloudflarestorage.com

# MinIO
s3duct put --bucket mybucket --name backup --no-encrypt \
  --endpoint-url http://localhost:9000

Options

s3duct put

Option Default Description
--bucket (required) S3 bucket name
--name (required) Stream name (used as S3 key prefix)
--chunk-size 512M Chunk size (e.g., 256M, 1G)
--key AES-256-GCM key (hex:..., file:..., or env:...)
--age-identity Path to age identity file (mutually exclusive with --key)
--no-encrypt Disable encryption even if key/identity provided
--encrypt-manifest Also encrypt the manifest under the same AES key
--tag Custom metadata tag (key=value, repeatable)
--storage-class STANDARD S3 storage class
--region AWS region
--prefix S3 key prefix
--endpoint-url Custom S3 endpoint URL
--strict-resume/--no-strict-resume on Fail if stdin ends before all resume-log chunks are re-verified
--summary text Summary output format: text, json, or none
--diskspace-limit auto Max scratch disk usage (e.g., 2G)
--buffer-chunks auto Max buffered chunks before backpressure

s3duct get

Option Default Description
--bucket (required) S3 bucket name
--name (required) Stream name to restore
--key AES-256-GCM key (hex:..., file:..., or env:...)
--age-identity Path to age identity file (mutually exclusive with --key)
--no-decrypt Skip decryption (download raw encrypted chunks)
--summary text Summary output format: text, json, or none
--region AWS region
--prefix S3 key prefix
--endpoint-url Custom S3 endpoint URL

s3duct list

Option Default Description
--bucket (required) S3 bucket name
--prefix Filter by prefix
--region AWS region
--endpoint-url Custom S3 endpoint URL

s3duct verify

Option Default Description
--bucket (required) S3 bucket name
--name (required) Stream name to verify
--summary text Summary output format: text, json, or none
--region AWS region
--prefix S3 key prefix
--endpoint-url Custom S3 endpoint URL

How It Works

Upload Pipeline

stdin → chunk (512MB default)
     → SHA-256 + SHA3-256 dual hash
     → HMAC-SHA256 signature chain
     → optional encryption (AES-256-GCM or age)
     → S3 upload with retry
     → append to resume log
     → repeat until EOF
     → upload manifest

Integrity Chain

Each chunk's signature is computed as:

chain[0] = HMAC-SHA256(genesis_key, sha256(chunk) || sha3_256(chunk))
chain[n] = HMAC-SHA256(chain[n-1], sha256(chunk) || sha3_256(chunk))

This creates a cryptographic chain where any modification, deletion, or reordering of chunks is detectable.

Resume

If an upload is interrupted, the resume log (stored locally and in S3) records which chunks were successfully uploaded along with their chain signatures. On restart, s3duct fast-forwards through stdin, verifying each chunk's hash matches the log, then continues uploading from where it left off.

Edge case — truncated resume input: if stdin ends before all previously-uploaded chunks are re-verified (e.g., the source process crashed or you piped a shorter stream), s3duct will by default return an error. --no-strict-resume is available to disable this safety check.

Shell pipefail

s3duct reads stdin until EOF. It cannot detect whether the upstream process exited successfully or crashed — both produce the same EOF. Use set -o pipefail in bash (or setopt PIPE_FAIL in zsh) so your shell reports upstream failures:

set -o pipefail
pg_dump mydb | s3duct put --bucket b --name db-backup || echo "upload or dump failed"

Without pipefail, a crash in pg_dump would go unnoticed — s3duct would upload whatever partial data it received, and the pipeline would exit 0.

If stdin is a regular file (not a pipe), s3duct checks the file size against bytes read and warns on mismatch, as this may indicate the file was modified during upload.

Backpressure

s3duct monitors available disk space in its scratch directory and pauses reading from stdin when approaching the limit. By default it auto-tunes based on available disk (2-10 chunks buffered). Use --diskspace-limit or --buffer-chunks for explicit control.

Development

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest

License

Elastic License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3duct-0.3.0.tar.gz (59.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3duct-0.3.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file s3duct-0.3.0.tar.gz.

File metadata

  • Download URL: s3duct-0.3.0.tar.gz
  • Upload date:
  • Size: 59.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for s3duct-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b07f705317cfa817127f0b2020e87c27083809b9b1b83d1a31ad63c6b46ebd34
MD5 17fa9546b88af8117e59945c6e6341cf
BLAKE2b-256 9464f767c0d935759b55fd8e663e16a49cbbb4863b82b9562eb10d9922227d63

See more details on using hashes here.

File details

Details for the file s3duct-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: s3duct-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for s3duct-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3377c6716f33edec7daa1119b42395bbc5b62ddc185b9c455c5ba3e30546c05e
MD5 50021e21ecd32a8eea73bfec533dc94e
BLAKE2b-256 96f26ee8dc3b0fa54404d2b5d9a062441266f06e624260e19e7777223d8475f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page