S3-compatible storage backend for genblaze (B2, R2, MinIO, AWS)
Project description
genblaze-s3
S3-compatible storage backend for genblaze AI media pipelines — durable, content-addressable, dedup-ready. Works with Backblaze B2 (recommended default), Cloudflare R2, MinIO, and AWS S3.
genblaze-s3 plugs into the genblaze ObjectStorageSink to persist AI-generated video, image, and audio — plus their SHA-256 provenance manifests — onto any S3-compatible object store. It handles streaming downloads from provider CDNs, SHA-256 hashing, multipart uploads with retries, pre-signed URLs for private buckets, and Object Lock retention for tamper-evident manifests on Backblaze B2.
Why genblaze-s3
- Durable by default — Assets + manifests land in object storage, never stuck in a provider's expiring CDN URL.
- Backblaze B2 first-class — One-line
S3StorageBackend.for_backblaze()helper, Object Lock support for immutable provenance. - Content-addressable dedup —
KeyStrategy.CONTENT_ADDRESSABLEstores each unique asset once by SHA-256. - Works with any S3 API — AWS S3, Backblaze B2, Cloudflare R2, MinIO, SeaweedFS, Wasabi, Ceph.
- Presigned URLs — private buckets get time-limited URLs; public buckets get permanent
public_url_baselinks. - Resilient multipart uploads — credential-preserving retries, preflight checks, no partial writes.
Backends
| Provider | Helper | Notes |
|---|---|---|
| Backblaze B2 | S3StorageBackend.for_backblaze("bucket") |
Reads B2_KEY_ID / B2_APP_KEY; Object Lock retention supported |
| AWS S3 | S3StorageBackend(bucket="...", region="...") |
Standard AWS credential chain |
| Cloudflare R2 | S3StorageBackend(bucket="...", endpoint_url="https://<acct>.r2.cloudflarestorage.com") |
|
| MinIO / self-hosted | S3StorageBackend(bucket="...", endpoint_url="https://minio.example.com") |
Install
pip install genblaze-s3
Quickstart — Backblaze B2 (recommended)
export B2_KEY_ID="..."
export B2_APP_KEY="..."
from genblaze_core import KeyStrategy, ObjectStorageSink, Pipeline
from genblaze_s3 import S3StorageBackend
from genblaze_replicate import ReplicateProvider
backend = S3StorageBackend.for_backblaze(
"my-genblaze-bucket",
# Defaults to "us-west-004". Pass the region your bucket actually lives
# in (e.g. "us-east-005", "eu-central-003") to skip the redirect hop —
# the backend auto-corrects on first use, but a right hint saves an RTT.
region="us-west-004",
# Optional: pass public_url_base for public buckets (get_url returns
# permanent URLs).
public_url_base="https://f004.backblazeb2.com/file/my-genblaze-bucket",
# Recommended in 0.3.0+: opt in to lifecycle defaults (cancel orphaned
# multipart uploads after 7 days, expire noncurrent versions after 30
# days). Default flipped to False to avoid silent bucket-wide config
# mutation; pass True or call `backend.ensure_lifecycle_defaults()`
# post-construction.
auto_lifecycle=True,
)
sink = ObjectStorageSink(
backend,
prefix="genblaze-assets",
key_strategy=KeyStrategy.CONTENT_ADDRESSABLE, # dedupe by SHA-256
)
result = (
Pipeline("b2-demo")
.step(ReplicateProvider(), model="black-forest-labs/flux-schnell",
prompt="a photorealistic cat wearing a tiny spacesuit")
.run(sink=sink, timeout=120)
)
for step in result.run.steps:
for asset in step.assets:
print(asset.url, asset.sha256)
backend.close()
Resulting bucket layout with CONTENT_ADDRESSABLE:
genblaze-assets/
├── assets/{sha[:2]}/{sha[2:4]}/{sha}.ext # one object per unique asset
└── manifests/{run_id}.json # one manifest per run
Switch to KeyStrategy.HIERARCHICAL for runs/{date}/{run_id}/… layout (better for run-grouped browsing, worse for dedup).
Quickstart — AWS S3
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
from genblaze_s3 import S3StorageBackend
backend = S3StorageBackend(bucket="my-genblaze-bucket", region="us-east-1")
# get_url() returns pre-signed URLs when public_url_base is not set
Quickstart — Cloudflare R2 / MinIO
from genblaze_s3 import S3StorageBackend
# R2
backend = S3StorageBackend(
bucket="my-bucket",
endpoint_url="https://<account-id>.r2.cloudflarestorage.com",
access_key_id="...", secret_access_key="...",
)
# MinIO
backend = S3StorageBackend(
bucket="my-bucket",
endpoint_url="https://minio.example.com",
access_key_id="...", secret_access_key="...",
)
URL flavors and credential redaction
backend.get_url(key) returns either a public URL (when public_url_base is
set) or a presigned SigV4 URL. Pass an explicit policy when your code path
requires a specific flavor:
from genblaze_s3 import URLPolicy
# Force public — raises URLPolicyError if public_url_base isn't configured.
url = backend.get_url("k", policy=URLPolicy.PUBLIC)
# Force presigned (even with public_url_base set) — useful for paid feeds.
url = backend.get_url("k", policy=URLPolicy.PRESIGNED, expires_in=900)
For credential-bearing URLs handed to HTTP clients, prefer the dedicated
methods — they return a PresignedURL value object that redacts the
SigV4 signature in repr()/str()/f"{...}", so accidental log-line
interpolation no longer leaks credentials:
download = backend.presigned_get("k", expires_in=3600)
upload = backend.presigned_put("k", expires_in=600, content_type="image/png")
print(f"download link: {download}")
# → download link: PresignedURL(... url='...?X-Amz-Signature=redacted...')
requests.get(download.url) # explicit `.url` accessor for the unredacted form
put() no longer returns a presigned URL (it returns the storage key
instead) — this fixes the credential-leak risk callers hit by persisting
the old return value to logs/manifests/DB rows.
Server-side encryption (SSE)
Encryption is a typed value object accepted symmetrically by put,
get, and copy:
from genblaze_s3 import Encryption
# SSE-S3 (server-managed AES-256)
backend.put("k", data, encryption=Encryption.sse_s3())
# SSE-KMS
backend.put("k", data, encryption=Encryption.sse_kms("alias/my-app"))
# SSE-C — same key required on read; round-trips cleanly in 0.3.0+
key = secrets.token_bytes(32)
enc = Encryption.sse_c(key)
backend.put("k", data, encryption=enc)
backend.get("k", encryption=enc)
See the main feature doc for SSE-C key handling, KMS configuration, and migration notes from 0.2.x.
Read primitives (head / list / get_range / stream)
# head — per-object metadata, None for missing/inaccessible.
meta = backend.head("path/to/key")
# list — paginated walk via continuation_token.
page = backend.list(prefix="run-", max_keys=1000)
for entry in page.entries:
...
if page.next_token is not None:
page = backend.list(prefix="run-", continuation_token=page.next_token)
# get_range — partial-file via HTTP Range header.
header = backend.get_range("big.mp4", offset=0, length=4096)
# stream — chunked download, no full-file RAM load.
for chunk in backend.stream("big.mp4", chunk_size=8 * 1024 * 1024):
...
Bulk deletes (delete_many / delete_prefix)
# Explicit-key delete: dry_run=False default.
result = backend.delete_many(["k1", "k2"])
# Prefix delete: dry_run=True default — see what would go before deleting.
preview = backend.delete_prefix("temp/")
result = backend.delete_prefix("temp/", dry_run=False)
delete_prefix streams page-by-page (memory bounded for huge
prefixes) and surfaces partial progress on a mid-walk failure.
Progress callbacks
from genblaze_core import TransferProgress
backend.put("big.mp4", data, progress=lambda p: print(p.bytes_transferred))
backend.get("big.mp4", progress=...)
for chunk in backend.stream("big.mp4", progress=...):
...
The put callback is thread-safe across boto3's multipart workers.
See the main feature doc
for the full progress contract.
Native async via aioboto3 (optional)
pip install 'genblaze-s3[async]'
import asyncio
from genblaze_s3 import AsyncS3StorageBackend
async def main():
async with AsyncS3StorageBackend.from_sync(my_sync_backend) as ab:
data = await ab.aget("k")
async for chunk in ab.astream("big.mp4"):
...
result = await ab.aput("k", data)
asyncio.run(main())
aget and astream are native (real AsyncIterator[bytes] for
streaming); other methods threadpool-delegate to the sync backend.
from_sync carries the sync backend's verified-region state forward
so no redundant HeadBucket round-trip happens on the async path.
Object Lock for immutable manifests (Backblaze B2)
Genblaze can apply Object Lock retention to uploaded manifests, producing tamper-evident provenance suitable for compliance, legal, and content-authenticity workflows. See the main repo docs for the Object Lock guide.
Documentation
- Main repo: https://github.com/backblaze-labs/genblaze
- Storage feature doc: https://github.com/backblaze-labs/genblaze/blob/main/docs/features/object-storage.md
- Runnable examples:
b2_storage_pipeline.py,s3_storage_pipeline.py
Related packages
genblaze-core— the pipeline SDK- Provider adapters:
genblaze-openai·genblaze-google·genblaze-runway·genblaze-luma·genblaze-replicate
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genblaze_s3-0.3.1.tar.gz.
File metadata
- Download URL: genblaze_s3-0.3.1.tar.gz
- Upload date:
- Size: 66.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ccd8a62a25d699e47b0ace11465d579d02cea3e4dae363971b28e73a2c2d0d8
|
|
| MD5 |
7e4986b77a89ac383aadb7f7851ec3a4
|
|
| BLAKE2b-256 |
41831237d01d84c7074e2c764495c9f7c5949fbfa702ae6ff5dc07a67a5a97ae
|
Provenance
The following attestation bundles were made for genblaze_s3-0.3.1.tar.gz:
Publisher:
release.yml on backblaze-labs/genblaze
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genblaze_s3-0.3.1.tar.gz -
Subject digest:
4ccd8a62a25d699e47b0ace11465d579d02cea3e4dae363971b28e73a2c2d0d8 - Sigstore transparency entry: 1585794743
- Sigstore integration time:
-
Permalink:
backblaze-labs/genblaze@a9ffeea9f9da0942e7c1619fa18024290dcac30e -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/backblaze-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a9ffeea9f9da0942e7c1619fa18024290dcac30e -
Trigger Event:
release
-
Statement type:
File details
Details for the file genblaze_s3-0.3.1-py3-none-any.whl.
File metadata
- Download URL: genblaze_s3-0.3.1-py3-none-any.whl
- Upload date:
- Size: 40.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
037b1c89892778ef02bbc81ad6dcff20d095e5a4da32910903e7ee9c998da981
|
|
| MD5 |
f7b79a3d01cf034803d94bafea51a067
|
|
| BLAKE2b-256 |
49fd2dc538c6e38afc2e68b270c630408d6d0a77888b9bcacf288c14ea63d9ac
|
Provenance
The following attestation bundles were made for genblaze_s3-0.3.1-py3-none-any.whl:
Publisher:
release.yml on backblaze-labs/genblaze
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genblaze_s3-0.3.1-py3-none-any.whl -
Subject digest:
037b1c89892778ef02bbc81ad6dcff20d095e5a4da32910903e7ee9c998da981 - Sigstore transparency entry: 1585794872
- Sigstore integration time:
-
Permalink:
backblaze-labs/genblaze@a9ffeea9f9da0942e7c1619fa18024290dcac30e -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/backblaze-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a9ffeea9f9da0942e7c1619fa18024290dcac30e -
Trigger Event:
release
-
Statement type: