Skip to main content

S3-compatible storage backend for genblaze (B2, R2, MinIO, AWS)

Project description

genblaze-s3

S3-compatible storage backend for genblaze AI media pipelines — durable, content-addressable, dedup-ready. Works with Backblaze B2 (recommended default), Cloudflare R2, MinIO, and AWS S3.

genblaze-s3 plugs into the genblaze ObjectStorageSink to persist AI-generated video, image, and audio — plus their SHA-256 provenance manifests — onto any S3-compatible object store. It handles streaming downloads from provider CDNs, SHA-256 hashing, multipart uploads with retries, pre-signed URLs for private buckets, and Object Lock retention for tamper-evident manifests on Backblaze B2.

Why genblaze-s3

  • Durable by default — Assets + manifests land in object storage, never stuck in a provider's expiring CDN URL.
  • Backblaze B2 first-class — One-line S3StorageBackend.for_backblaze() helper, Object Lock support for immutable provenance.
  • Content-addressable dedupKeyStrategy.CONTENT_ADDRESSABLE stores each unique asset once by SHA-256.
  • Works with any S3 API — AWS S3, Backblaze B2, Cloudflare R2, MinIO, SeaweedFS, Wasabi, Ceph.
  • Presigned URLs — private buckets get time-limited URLs; public buckets get permanent public_url_base links.
  • Resilient multipart uploads — credential-preserving retries, preflight checks, no partial writes.

Backends

Provider Helper Notes
Backblaze B2 S3StorageBackend.for_backblaze("bucket") Reads B2_KEY_ID / B2_APP_KEY; Object Lock retention supported
AWS S3 S3StorageBackend(bucket="...", region="...") Standard AWS credential chain
Cloudflare R2 S3StorageBackend(bucket="...", endpoint_url="https://<acct>.r2.cloudflarestorage.com")
MinIO / self-hosted S3StorageBackend(bucket="...", endpoint_url="https://minio.example.com")

Install

pip install genblaze-s3

Quickstart — Backblaze B2 (recommended)

export B2_KEY_ID="..."
export B2_APP_KEY="..."
from genblaze_core import KeyStrategy, ObjectStorageSink, Pipeline
from genblaze_s3 import S3StorageBackend
from genblaze_replicate import ReplicateProvider

backend = S3StorageBackend.for_backblaze(
    "my-genblaze-bucket",
    # Defaults to "us-west-004". Pass the region your bucket actually lives
    # in (e.g. "us-east-005", "eu-central-003") to skip the redirect hop —
    # the backend auto-corrects on first use, but a right hint saves an RTT.
    region="us-west-004",
    # Optional: pass public_url_base for public buckets (get_url returns
    # permanent URLs).
    public_url_base="https://f004.backblazeb2.com/file/my-genblaze-bucket",
    # Recommended in 0.3.0+: opt in to lifecycle defaults (cancel orphaned
    # multipart uploads after 7 days, expire noncurrent versions after 30
    # days). Default flipped to False to avoid silent bucket-wide config
    # mutation; pass True or call `backend.ensure_lifecycle_defaults()`
    # post-construction.
    auto_lifecycle=True,
)

sink = ObjectStorageSink(
    backend,
    prefix="genblaze-assets",
    key_strategy=KeyStrategy.CONTENT_ADDRESSABLE,   # dedupe by SHA-256
)

result = (
    Pipeline("b2-demo")
    .step(ReplicateProvider(), model="black-forest-labs/flux-schnell",
          prompt="a photorealistic cat wearing a tiny spacesuit")
    .run(sink=sink, timeout=120)
)

for step in result.run.steps:
    for asset in step.assets:
        print(asset.url, asset.sha256)

backend.close()

Resulting bucket layout with CONTENT_ADDRESSABLE:

genblaze-assets/
├── assets/{sha[:2]}/{sha[2:4]}/{sha}.ext    # one object per unique asset
└── manifests/{run_id}.json                   # one manifest per run

Switch to KeyStrategy.HIERARCHICAL for runs/{date}/{run_id}/… layout (better for run-grouped browsing, worse for dedup).

Quickstart — AWS S3

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
from genblaze_s3 import S3StorageBackend

backend = S3StorageBackend(bucket="my-genblaze-bucket", region="us-east-1")
# get_url() returns pre-signed URLs when public_url_base is not set

Quickstart — Cloudflare R2 / MinIO

from genblaze_s3 import S3StorageBackend

# R2
backend = S3StorageBackend(
    bucket="my-bucket",
    endpoint_url="https://<account-id>.r2.cloudflarestorage.com",
    access_key_id="...", secret_access_key="...",
)

# MinIO
backend = S3StorageBackend(
    bucket="my-bucket",
    endpoint_url="https://minio.example.com",
    access_key_id="...", secret_access_key="...",
)

URL flavors and credential redaction

backend.get_url(key) returns either a public URL (when public_url_base is set) or a presigned SigV4 URL. Pass an explicit policy when your code path requires a specific flavor:

from genblaze_s3 import URLPolicy

# Force public — raises URLPolicyError if public_url_base isn't configured.
url = backend.get_url("k", policy=URLPolicy.PUBLIC)

# Force presigned (even with public_url_base set) — useful for paid feeds.
url = backend.get_url("k", policy=URLPolicy.PRESIGNED, expires_in=900)

For credential-bearing URLs handed to HTTP clients, prefer the dedicated methods — they return a PresignedURL value object that redacts the SigV4 signature in repr()/str()/f"{...}", so accidental log-line interpolation no longer leaks credentials:

download = backend.presigned_get("k", expires_in=3600)
upload = backend.presigned_put("k", expires_in=600, content_type="image/png")

print(f"download link: {download}")
# → download link: PresignedURL(... url='...?X-Amz-Signature=redacted...')

requests.get(download.url)  # explicit `.url` accessor for the unredacted form

put() no longer returns a presigned URL (it returns the storage key instead) — this fixes the credential-leak risk callers hit by persisting the old return value to logs/manifests/DB rows.

Server-side encryption (SSE)

Encryption is a typed value object accepted symmetrically by put, get, and copy:

from genblaze_s3 import Encryption

# SSE-S3 (server-managed AES-256)
backend.put("k", data, encryption=Encryption.sse_s3())

# SSE-KMS
backend.put("k", data, encryption=Encryption.sse_kms("alias/my-app"))

# SSE-C — same key required on read; round-trips cleanly in 0.3.0+
key = secrets.token_bytes(32)
enc = Encryption.sse_c(key)
backend.put("k", data, encryption=enc)
backend.get("k", encryption=enc)

See the main feature doc for SSE-C key handling, KMS configuration, and migration notes from 0.2.x.

Read primitives (head / list / get_range / stream)

# head — per-object metadata, None for missing/inaccessible.
meta = backend.head("path/to/key")

# list — paginated walk via continuation_token.
page = backend.list(prefix="run-", max_keys=1000)
for entry in page.entries:
    ...
if page.next_token is not None:
    page = backend.list(prefix="run-", continuation_token=page.next_token)

# get_range — partial-file via HTTP Range header.
header = backend.get_range("big.mp4", offset=0, length=4096)

# stream — chunked download, no full-file RAM load.
for chunk in backend.stream("big.mp4", chunk_size=8 * 1024 * 1024):
    ...

Bulk deletes (delete_many / delete_prefix)

# Explicit-key delete: dry_run=False default.
result = backend.delete_many(["k1", "k2"])

# Prefix delete: dry_run=True default — see what would go before deleting.
preview = backend.delete_prefix("temp/")
result = backend.delete_prefix("temp/", dry_run=False)

delete_prefix streams page-by-page (memory bounded for huge prefixes) and surfaces partial progress on a mid-walk failure.

Progress callbacks

from genblaze_core import TransferProgress

backend.put("big.mp4", data, progress=lambda p: print(p.bytes_transferred))
backend.get("big.mp4", progress=...)
for chunk in backend.stream("big.mp4", progress=...):
    ...

The put callback is thread-safe across boto3's multipart workers. See the main feature doc for the full progress contract.

Native async via aioboto3 (optional)

pip install 'genblaze-s3[async]'
import asyncio
from genblaze_s3 import AsyncS3StorageBackend

async def main():
    async with AsyncS3StorageBackend.from_sync(my_sync_backend) as ab:
        data = await ab.aget("k")
        async for chunk in ab.astream("big.mp4"):
            ...
        result = await ab.aput("k", data)

asyncio.run(main())

aget and astream are native (real AsyncIterator[bytes] for streaming); other methods threadpool-delegate to the sync backend. from_sync carries the sync backend's verified-region state forward so no redundant HeadBucket round-trip happens on the async path.

Object Lock for immutable manifests (Backblaze B2)

Genblaze can apply Object Lock retention to uploaded manifests, producing tamper-evident provenance suitable for compliance, legal, and content-authenticity workflows. See the main repo docs for the Object Lock guide.

Documentation

Related packages

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genblaze_s3-0.3.1.tar.gz (66.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genblaze_s3-0.3.1-py3-none-any.whl (40.5 kB view details)

Uploaded Python 3

File details

Details for the file genblaze_s3-0.3.1.tar.gz.

File metadata

  • Download URL: genblaze_s3-0.3.1.tar.gz
  • Upload date:
  • Size: 66.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genblaze_s3-0.3.1.tar.gz
Algorithm Hash digest
SHA256 4ccd8a62a25d699e47b0ace11465d579d02cea3e4dae363971b28e73a2c2d0d8
MD5 7e4986b77a89ac383aadb7f7851ec3a4
BLAKE2b-256 41831237d01d84c7074e2c764495c9f7c5949fbfa702ae6ff5dc07a67a5a97ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for genblaze_s3-0.3.1.tar.gz:

Publisher: release.yml on backblaze-labs/genblaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genblaze_s3-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: genblaze_s3-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 40.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genblaze_s3-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 037b1c89892778ef02bbc81ad6dcff20d095e5a4da32910903e7ee9c998da981
MD5 f7b79a3d01cf034803d94bafea51a067
BLAKE2b-256 49fd2dc538c6e38afc2e68b270c630408d6d0a77888b9bcacf288c14ea63d9ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for genblaze_s3-0.3.1-py3-none-any.whl:

Publisher: release.yml on backblaze-labs/genblaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page