Skip to main content

S3 overlay proxy for transparent remote object caching

Project description

S3 Overlay Proxy

CD PyPI GHCR

The S3 overlay proxy is a standalone package that provides transparent caching of S3 objects from a remote bucket into a local MinIO instance. Built with Litestar, it sits in front of the local MinIO process and mirrors objects on demand from a remote S3 bucket.

All local reads and writes still target MinIO; a cache miss transparently downloads the object from the upstream bucket, stores it in MinIO, and returns the payload to the caller.

Features

  • Transparent caching: GET/HEAD requests automatically fetch missing objects from remote S3
  • Local-first: All writes go directly to local MinIO
  • Auto-bucket creation: Buckets are created automatically when objects are mirrored
  • Range request support: Partial content requests are properly proxied
  • Partial Caching: Large files are cached in chunks to avoid downloading the entire object when only a range is requested
  • Zero-config local mode: Works without remote configuration for local-only development

Installation

uv add s3-overlay

Or add to your pyproject.toml:

dependencies = [
  "s3-overlay",
]

Docker

docker run -p 8000:8000 ghcr.io/elohmeier/s3-overlay:latest

Pass configuration via environment variables (-e), for example:

docker run -p 8000:8000 \
  -e S3_OVERLAY_REMOTE_ENDPOINT=https://s3.eu-central-1.amazonaws.com \
  -e S3_OVERLAY_REMOTE_REGION=eu-central-1 \
  -e S3_OVERLAY_REMOTE_ACCESS_KEY_ID=AKIA... \
  -e S3_OVERLAY_REMOTE_SECRET_ACCESS_KEY=secret... \
  -e S3_OVERLAY_BUCKET_MAPPING=local-bucket:remote-bucket \
  ghcr.io/elohmeier/s3-overlay:latest

Usage

Running the Proxy

The proxy is served with Granian in factory mode:

uv run litestar --app s3_overlay.app:create_app run --host 0.0.0.0

How It Works

  • Writes (PUT, multipart uploads, deletes) are handled only by MinIO
  • Reads (GET, HEAD) hit MinIO first and fall back to the remote source if the object is missing locally
  • When caching is enabled (S3_OVERLAY_CACHE_ENABLED=true), objects fetched from the remote source are written into MinIO so subsequent requests stay local
  • When caching is disabled (default), objects are streamed directly from the remote source without storing them locally

Configuration

Configure the proxy using environment variables:

Local MinIO (Required)

Variable Description Default
S3_OVERLAY_LOCAL_ENDPOINT Local MinIO endpoint URL http://127.0.0.1:9000
S3_OVERLAY_LOCAL_ACCESS_KEY Local MinIO access key minioadmin
S3_OVERLAY_LOCAL_SECRET_KEY Local MinIO secret key minioadmin
S3_OVERLAY_LOCAL_REGION AWS region for local MinIO us-east-1
S3_OVERLAY_DEFAULT_BUCKET_LOCATION Default bucket location constraint us-east-1
S3_OVERLAY_CHUNK_THRESHOLD File size threshold for chunking 52428800 (50MB)
S3_OVERLAY_CHUNK_SIZE Chunk size for partial caching 16777216 (16MB)
S3_OVERLAY_CACHE_BUCKET Bucket name for storing chunks s3-overlay-cache
S3_OVERLAY_CACHE_ENABLED Enable local caching of remote objects false

Remote S3 (Optional)

Set the following variables to enable remote backfilling. Leave unset to operate in local-only mode.

Variable Description
S3_OVERLAY_REMOTE_ENDPOINT Optional custom URL for upstream S3 API (defaults to AWS)
S3_OVERLAY_REMOTE_REGION AWS region for the remote bucket (e.g. eu-central-1)
S3_OVERLAY_REMOTE_ADDRESSING_STYLE S3 addressing style: virtual or path (default: virtual)
S3_OVERLAY_REMOTE_ACCESS_KEY_ID Credentials with read access to the remote bucket
S3_OVERLAY_REMOTE_SECRET_ACCESS_KEY Secret key for remote bucket
S3_OVERLAY_REMOTE_SESSION_TOKEN Optional session token for temporary credentials
S3_OVERLAY_BUCKET_MAPPING Map local bucket names to remote bucket names (format: local1:remote1,local2:remote2)

Example with AWS S3:

environment:
  - S3_OVERLAY_REMOTE_ENDPOINT=https://s3.eu-central-1.amazonaws.com
  - S3_OVERLAY_REMOTE_REGION=eu-central-1
  - S3_OVERLAY_REMOTE_ADDRESSING_STYLE=path
  - S3_OVERLAY_REMOTE_ACCESS_KEY_ID=AKIA...
  - S3_OVERLAY_REMOTE_SECRET_ACCESS_KEY=secret...
  - S3_OVERLAY_BUCKET_MAPPING=local-bucket:remote-bucket

Example with Hetzner Cloud Object Storage:

environment:
  - S3_OVERLAY_REMOTE_ENDPOINT=https://fsn1.your-objectstorage.com
  - S3_OVERLAY_REMOTE_REGION=us-east-1
  - S3_OVERLAY_REMOTE_ADDRESSING_STYLE=virtual # Uses bucket.endpoint.com format
  - S3_OVERLAY_REMOTE_ACCESS_KEY_ID=YOUR_ACCESS_KEY
  - S3_OVERLAY_REMOTE_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
  - S3_OVERLAY_BUCKET_MAPPING=local-bucket:remote-bucket

With virtual host style, the proxy will access remote objects at: https://remote-bucket.fsn1.your-objectstorage.com/organizations/...

Buckets are created automatically in MinIO when the proxy mirrors an object. Override S3_OVERLAY_DEFAULT_BUCKET_LOCATION if the MinIO cluster expects a non-us-east-1 location constraint.

Development

Running Tests

Tests use pytest-databases to spin up real MinIO containers:

uv run pytest tests/ -v

Tests verify:

  • Object caching from remote to local
  • HEAD request handling
  • Local cache hits without remote access
  • 404 handling for missing objects
  • Bucket auto-creation

Architecture

The package is structured as follows:

s3-overlay/
├── s3_overlay/
│   ├── __init__.py      # Public API exports
│   ├── proxy.py         # S3OverlayProxy, LocalSettings, RemoteSettings
│   └── app.py           # Litestar ASGI application factory
├── tests/
│   ├── conftest.py      # Pytest fixtures for MinIO
│   ├── test_proxy.py    # Unit tests
│   └── test_integration.py  # Integration tests with test.txt
├── pyproject.toml
└── README.md

Key Components

  • S3OverlayProxy: Core proxy logic with boto3 clients for local and remote S3
  • LocalSettings/RemoteSettings: Configuration models loaded from environment
  • create_app(): Litestar application factory for ASGI servers

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3_overlay-1.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3_overlay-1.1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file s3_overlay-1.1.0.tar.gz.

File metadata

  • Download URL: s3_overlay-1.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for s3_overlay-1.1.0.tar.gz
Algorithm Hash digest
SHA256 746e8823d586c6a27594e0854c2a2f44ee2a03fac1d2d8d8e7aa1aa95a35feeb
MD5 b33edd05070e33b5998d4ec5e3a39e2c
BLAKE2b-256 d9890ec0393798ef54e214f3352909c14e1d390e3b181314a417ad7ca37dfe84

See more details on using hashes here.

Provenance

The following attestation bundles were made for s3_overlay-1.1.0.tar.gz:

Publisher: cd.yaml on elohmeier/s3-overlay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file s3_overlay-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: s3_overlay-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for s3_overlay-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97a691e88e399d08ec25ca283c7cfa8bf3cd5b191852f0463cecc2dd5e4db9cd
MD5 40db74743c9855fb9012789dbb3cb3f4
BLAKE2b-256 df9eeb64143698d75ee8c9eab7fbbb94dc68a06247950416049d909a74a6c152

See more details on using hashes here.

Provenance

The following attestation bundles were made for s3_overlay-1.1.0-py3-none-any.whl:

Publisher: cd.yaml on elohmeier/s3-overlay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page