Skip to main content

S3-compatible storage backend for genblaze (B2, R2, MinIO, AWS)

Project description

genblaze-s3

S3-compatible storage backend for genblaze AI media pipelines — durable, content-addressable, dedup-ready. Works with Backblaze B2 (recommended default), Cloudflare R2, MinIO, and AWS S3.

genblaze-s3 plugs into the genblaze ObjectStorageSink to persist AI-generated video, image, and audio — plus their SHA-256 provenance manifests — onto any S3-compatible object store. It handles streaming downloads from provider CDNs, SHA-256 hashing, multipart uploads with retries, pre-signed URLs for private buckets, and Object Lock retention for tamper-evident manifests on Backblaze B2.

Why genblaze-s3

  • Durable by default — Assets + manifests land in object storage, never stuck in a provider's expiring CDN URL.
  • Backblaze B2 first-class — One-line S3StorageBackend.for_backblaze() helper, Object Lock support for immutable provenance.
  • Content-addressable dedupKeyStrategy.CONTENT_ADDRESSABLE stores each unique asset once by SHA-256.
  • Works with any S3 API — AWS S3, Backblaze B2, Cloudflare R2, MinIO, SeaweedFS, Wasabi, Ceph.
  • Presigned URLs — private buckets get time-limited URLs; public buckets get permanent public_url_base links.
  • Resilient multipart uploads — credential-preserving retries, preflight checks, no partial writes.

Backends

Provider Helper Notes
Backblaze B2 S3StorageBackend.for_backblaze("bucket") Reads B2_KEY_ID / B2_APP_KEY; Object Lock retention supported
AWS S3 S3StorageBackend(bucket="...", region="...") Standard AWS credential chain
Cloudflare R2 S3StorageBackend(bucket="...", endpoint_url="https://<acct>.r2.cloudflarestorage.com")
MinIO / self-hosted S3StorageBackend(bucket="...", endpoint_url="https://minio.example.com")

Install

pip install genblaze-s3

Quickstart — Backblaze B2 (recommended)

export B2_KEY_ID="..."
export B2_APP_KEY="..."
from genblaze_core import KeyStrategy, ObjectStorageSink, Pipeline
from genblaze_s3 import S3StorageBackend
from genblaze_replicate import ReplicateProvider

backend = S3StorageBackend.for_backblaze(
    "my-genblaze-bucket",
    # Defaults to "us-west-004". Pass the region your bucket actually lives
    # in (e.g. "us-east-005", "eu-central-003") to skip the redirect hop —
    # the backend auto-corrects on first use, but a right hint saves an RTT.
    region="us-west-004",
    # Optional: pass public_url_base for public buckets (get_url returns
    # permanent URLs).
    public_url_base="https://f004.backblazeb2.com/file/my-genblaze-bucket",
    # Recommended in 0.3.0+: opt in to lifecycle defaults (cancel orphaned
    # multipart uploads after 7 days, expire noncurrent versions after 30
    # days). Default flipped to False to avoid silent bucket-wide config
    # mutation; pass True or call `backend.ensure_lifecycle_defaults()`
    # post-construction.
    auto_lifecycle=True,
)

sink = ObjectStorageSink(
    backend,
    prefix="genblaze-assets",
    key_strategy=KeyStrategy.CONTENT_ADDRESSABLE,   # dedupe by SHA-256
)

result = (
    Pipeline("b2-demo")
    .step(ReplicateProvider(), model="black-forest-labs/flux-schnell",
          prompt="a photorealistic cat wearing a tiny spacesuit")
    .run(sink=sink, timeout=120)
)

for step in result.run.steps:
    for asset in step.assets:
        print(asset.url, asset.sha256)

backend.close()

Resulting bucket layout with CONTENT_ADDRESSABLE:

genblaze-assets/
├── assets/{sha[:2]}/{sha[2:4]}/{sha}.ext    # one object per unique asset
└── manifests/{run_id}.json                   # one manifest per run

Switch to KeyStrategy.HIERARCHICAL for runs/{date}/{run_id}/… layout (better for run-grouped browsing, worse for dedup).

Quickstart — AWS S3

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
from genblaze_s3 import S3StorageBackend

backend = S3StorageBackend(bucket="my-genblaze-bucket", region="us-east-1")
# get_url() returns pre-signed URLs when public_url_base is not set

Quickstart — Cloudflare R2 / MinIO

from genblaze_s3 import S3StorageBackend

# R2
backend = S3StorageBackend(
    bucket="my-bucket",
    endpoint_url="https://<account-id>.r2.cloudflarestorage.com",
    access_key_id="...", secret_access_key="...",
)

# MinIO
backend = S3StorageBackend(
    bucket="my-bucket",
    endpoint_url="https://minio.example.com",
    access_key_id="...", secret_access_key="...",
)

URL flavors and credential redaction

backend.get_url(key) returns either a public URL (when public_url_base is set) or a presigned SigV4 URL. Pass an explicit policy when your code path requires a specific flavor:

from genblaze_s3 import URLPolicy

# Force public — raises URLPolicyError if public_url_base isn't configured.
url = backend.get_url("k", policy=URLPolicy.PUBLIC)

# Force presigned (even with public_url_base set) — useful for paid feeds.
url = backend.get_url("k", policy=URLPolicy.PRESIGNED, expires_in=900)

For credential-bearing URLs handed to HTTP clients, prefer the dedicated methods — they return a PresignedURL value object that redacts the SigV4 signature in repr()/str()/f"{...}", so accidental log-line interpolation no longer leaks credentials:

download = backend.presigned_get("k", expires_in=3600)
upload = backend.presigned_put("k", expires_in=600, content_type="image/png")

print(f"download link: {download}")
# → download link: PresignedURL(... url='...?X-Amz-Signature=redacted...')

requests.get(download.url)  # explicit `.url` accessor for the unredacted form

put() no longer returns a presigned URL (it returns the storage key instead) — this fixes the credential-leak risk callers hit by persisting the old return value to logs/manifests/DB rows.

Server-side encryption (SSE)

Encryption is a typed value object accepted symmetrically by put, get, and copy:

from genblaze_s3 import Encryption

# SSE-S3 (server-managed AES-256)
backend.put("k", data, encryption=Encryption.sse_s3())

# SSE-KMS
backend.put("k", data, encryption=Encryption.sse_kms("alias/my-app"))

# SSE-C — same key required on read; round-trips cleanly in 0.3.0+
key = secrets.token_bytes(32)
enc = Encryption.sse_c(key)
backend.put("k", data, encryption=enc)
backend.get("k", encryption=enc)

See the main feature doc for SSE-C key handling, KMS configuration, and migration notes from 0.2.x.

Read primitives (head / list / get_range / stream)

# head — per-object metadata, None for missing/inaccessible.
meta = backend.head("path/to/key")

# list — paginated walk via continuation_token.
page = backend.list(prefix="run-", max_keys=1000)
for entry in page.entries:
    ...
if page.next_token is not None:
    page = backend.list(prefix="run-", continuation_token=page.next_token)

# get_range — partial-file via HTTP Range header.
header = backend.get_range("big.mp4", offset=0, length=4096)

# stream — chunked download, no full-file RAM load.
for chunk in backend.stream("big.mp4", chunk_size=8 * 1024 * 1024):
    ...

Bulk deletes (delete_many / delete_prefix)

# Explicit-key delete: dry_run=False default.
result = backend.delete_many(["k1", "k2"])

# Prefix delete: dry_run=True default — see what would go before deleting.
preview = backend.delete_prefix("temp/")
result = backend.delete_prefix("temp/", dry_run=False)

delete_prefix streams page-by-page (memory bounded for huge prefixes) and surfaces partial progress on a mid-walk failure.

Progress callbacks

from genblaze_core import TransferProgress

backend.put("big.mp4", data, progress=lambda p: print(p.bytes_transferred))
backend.get("big.mp4", progress=...)
for chunk in backend.stream("big.mp4", progress=...):
    ...

The put callback is thread-safe across boto3's multipart workers. See the main feature doc for the full progress contract.

Native async via aioboto3 (optional)

pip install 'genblaze-s3[async]'
import asyncio
from genblaze_s3 import AsyncS3StorageBackend

async def main():
    async with AsyncS3StorageBackend.from_sync(my_sync_backend) as ab:
        data = await ab.aget("k")
        async for chunk in ab.astream("big.mp4"):
            ...
        result = await ab.aput("k", data)

asyncio.run(main())

aget and astream are native (real AsyncIterator[bytes] for streaming); other methods threadpool-delegate to the sync backend. from_sync carries the sync backend's verified-region state forward so no redundant HeadBucket round-trip happens on the async path.

Object Lock for immutable manifests (Backblaze B2)

Genblaze can apply Object Lock retention to uploaded manifests, producing tamper-evident provenance suitable for compliance, legal, and content-authenticity workflows. See the main repo docs for the Object Lock guide.

Documentation

Related packages

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genblaze_s3-0.3.2.tar.gz (74.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genblaze_s3-0.3.2-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

File details

Details for the file genblaze_s3-0.3.2.tar.gz.

File metadata

  • Download URL: genblaze_s3-0.3.2.tar.gz
  • Upload date:
  • Size: 74.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genblaze_s3-0.3.2.tar.gz
Algorithm Hash digest
SHA256 a5bb7d4ccd2c803eaf915b690423cdafb699432070806a21122e4b8ce19c92bd
MD5 08e93e31eaf3eb0c9017248a1d69e9af
BLAKE2b-256 4c6831a6c0a42dcf3b50abd9a65df55893bd192a04de8ce0781ba5db49abab13

See more details on using hashes here.

Provenance

The following attestation bundles were made for genblaze_s3-0.3.2.tar.gz:

Publisher: release.yml on backblaze-labs/genblaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genblaze_s3-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: genblaze_s3-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genblaze_s3-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c40c39f1e03190b986ce343215b84942707a7251e6fc678d812852b6fa390109
MD5 c61aaed79bbd7e0ebba8b8cac47ab0da
BLAKE2b-256 fdfc1d46456022edb4b6ea2de1ca987427ac0cd1ab0f68463fd9b81786d9ef28

See more details on using hashes here.

Provenance

The following attestation bundles were made for genblaze_s3-0.3.2-py3-none-any.whl:

Publisher: release.yml on backblaze-labs/genblaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page