Skip to main content

In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage.

Project description

BucketHead

In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage (AWS S3, Cloudflare R2, MinIO).

Your app writes to a regular sqlite3.Connection. BucketHead keeps the database in memory, snapshots the whole thing to your bucket on a timer, and restores it on startup. No Redis, no managed service — just SQLite and one bucket.

Why

  • Fast. Reads and writes hit an in-memory SQLite; microsecond latencies. BucketHead is never in the hot path.
  • Durable. Snapshots land on S3 / R2 / MinIO at a configurable cadence, plus one final flush on shutdown.
  • Cheap. Dirty-bit optimization skips the upload whenever the database hasn't changed since the last flush — so an idle workload costs nothing.
  • Structured. You're using SQLite, not a hash table. Schemas, indexes, transactions, joins — all the usual stuff.
  • R2-friendly. Zero egress fees + the dirty-bit mean snapshot cost is dominated by storage, not requests.

Install

uv add buckethead                # runtime only
uv add 'buckethead[profiling]'   # + memray / pyinstrument hooks

Python 3.13+.

Quickstart

from pathlib import Path
from buckethead import BucketConfig, BucketSQLite

cfg = BucketConfig.for_r2(
    account_id="<cloudflare account id>",
    bucket="my-bucket",
    access_key_id="<r2 s3 api access key>",
    secret_access_key="<r2 s3 api secret>",
)

with BucketSQLite(cfg) as bh:
    # Raw SQL
    bh.connection.execute("CREATE TABLE kv (k TEXT PRIMARY KEY, v TEXT)")
    bh.connection.execute("INSERT INTO kv VALUES ('answer', '42')")
    bh.connection.commit()

    # Key/value interface
    bh.kv.set("user/123", "alice")
    bh.kv.get("user/123")                      # "alice"

    # File store — content-addressable, dedup'd
    bh_key = bh.files.put(Path("/tmp/big.bin"))
    bh.files.get(bh_key, dest=Path("/tmp/out.bin"))

# On exit: final flush → snapshot uploaded to R2.
# On next startup: restored automatically.

Env-driven config

For 12-factor deployments, use BucketSettings:

from buckethead import BucketSettings, BucketSQLite

cfg = BucketSettings.from_env().to_bucket_config()
# reads R2_ACCOUNT_ID (or R2_ENDPOINT_URL), R2_BUCKET,
# R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and BUCKETHEAD_KEY
bh = BucketSQLite(cfg)

Three typed views on your data

Attribute Access pattern What it's for
bh.connection raw SQL anything — full SQLite is yours
bh.kv string-keyed set / get / dict protocol small configuration, cache entries, session data
bh.files content-addressable SHA-256 bh-key → bytes in R2 arbitrary files (uploads, artifacts, ML inputs)

Branches

BucketHead can maintain multiple named snapshots against the same bucket, one per "branch". Useful for trying risky migrations or running experiments without polluting main state.

bh.branches.create("experiment-1")      # fork from current
bh.branches.switch("experiment-1")      # flush outgoing, reload from target
bh.connection.execute("...")             # writes go to experiment-1's snapshot
bh.branches.switch("main")              # back to main
bh.branches.list()                      # ["experiment-1", "main"]

# Experiment succeeded — make main look like experiment-1:
bh.branches.switch("experiment-1")
bh.branches.overwrite("main")           # identity becomes main; in-memory unchanged
bh.branches.delete("experiment-1")      # optional cleanup
  • Branches are R2 keys: main is <BucketConfig.key>; branch X is <BucketConfig.key>.branch.X. Listed via live ListObjectsV2 — no registry.
  • Names must match [a-zA-Z0-9_-]+; main is reserved.
  • FileStore blobs are shared across branches. Content-hash keys dedupe automatically; deleting a branch does not delete blobs that were unique to it. Run bh.files.gc() for orphan cleanup.
  • Default branch at startup is main; override via BucketConfig(initial_branch="X") or BUCKETHEAD_BRANCH=X.

The connection lives in memory; kv rows and files metadata also live in memory (they're SQLite tables). Only files blobs are stored outside SQLite — one R2 object per blob, under a configurable files/ prefix.

CLI

buckethead inspect                         # schema + row counts of the snapshot
buckethead restore <local.db>              # download the snapshot to disk
buckethead files list                      # paginated listing
buckethead files get <bh-key> <dest>       # download one blob
buckethead files gc --dry-run              # what would orphan cleanup delete?
buckethead files gc --grace-seconds 300    # actually clean up

All CLI commands read credentials from the same env vars as BucketSettings.from_env().

Observability

Pass callbacks to BucketSQLite to wire metrics or tracing:

def on_flush_start() -> None: ...
def on_flush_complete(duration_s: float, bytes_uploaded: int) -> None: ...
def on_flush_error(exc: BaseException) -> None: ...

bh = BucketSQLite(
    cfg,
    on_flush_start=on_flush_start,
    on_flush_complete=on_flush_complete,
    on_flush_error=on_flush_error,
)

bytes_uploaded == 0 means the dirty-bit skipped the upload — the DB didn't change since the last flush.

For deeper profiling, enable the built-in hooks:

from buckethead import BucketSQLite, ProfilingConfig

bh = BucketSQLite(
    cfg,
    profiling_config=ProfilingConfig(
        io_counters=True,     # JSON summary of bytes / ops per R2 call
        memory=True,          # requires buckethead[profiling]
        cpu=True,             # requires buckethead[profiling]
    ),
)

On bh.stop(), profiling artifacts are written under ProfilingConfig.output_dir (default ./buckethead-profiles).

Configuration

BucketConfig

Field Default Notes
bucket required bucket name
key bucketsqlite/snap.db where the snapshot lives
endpoint_url None set for R2/MinIO; leave None for AWS S3
region "auto" R2 wants "auto"; AWS wants a real region
access_key_id / secret_access_key None or use IAM / env
files_prefix "files/" FileStore objects go under this prefix

For R2, use BucketConfig.for_r2(account_id, bucket, access_key_id, secret_access_key) to skip the endpoint-URL boilerplate.

SnapshotConfig

Field Default Notes
interval_seconds 60.0 background flush cadence
min_interval_seconds 5.0 debounce for flush() (manual)
keep_previous True save <key>.prev before overwriting

Env vars (for BucketSettings)

Variable Purpose
R2_ACCOUNT_ID Cloudflare account id (used to build endpoint URL)
R2_ENDPOINT_URL full endpoint; overrides R2_ACCOUNT_ID
R2_BUCKET bucket name
R2_ACCESS_KEY_ID R2 S3 API token access key
R2_SECRET_ACCESS_KEY R2 S3 API token secret
BUCKETHEAD_KEY snapshot key (default bucketsqlite/snap.db)
BUCKETHEAD_BRANCH initial branch the process attaches to (default main)

Both R2_ACCOUNT_ID and R2_ENDPOINT_URL are optional — if neither is set, endpoint_url stays None and boto3 connects to AWS S3 with its normal credential discovery.

Constraints and scope

  • Single-process. The in-memory database lives in the process that constructs BucketSQLite. Multi-threaded access in that process is fine (bh.connect() vends a new connection per thread), but cross- process sharing is not supported.
  • Durability window. Hard crash (OOM, SIGKILL, power loss) loses up to interval_seconds of writes. Call bh.force_flush() after any write that must not be lost.
  • DB size. Snapshot wall-time is ~1 ms per MB of DB. Comfortable below 100 MB, usable up to ~500 MB, noticeable stalls above that.
  • Not a Redis drop-in. No wire protocol, no pub/sub, no replication.
  • Not a distributed database. Single writer, no HA.

Deeper reading

License

TBD.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buckethead-0.1.0.tar.gz (43.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

buckethead-0.1.0-py3-none-any.whl (54.0 kB view details)

Uploaded Python 3

File details

Details for the file buckethead-0.1.0.tar.gz.

File metadata

  • Download URL: buckethead-0.1.0.tar.gz
  • Upload date:
  • Size: 43.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for buckethead-0.1.0.tar.gz
Algorithm Hash digest
SHA256 884043bf7cf433da2c8b14adce2884d625bd80a1aa591dfbd1c17fa3e39f593c
MD5 60e7e8d8f6410fcbce5d66e17b6a3d71
BLAKE2b-256 6e2c94620fae6ee203dc61bdd85e091a7df02c20d5ae4be91e059f3c62d4d0b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for buckethead-0.1.0.tar.gz:

Publisher: workflow.yaml on cloutfront/buckethead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file buckethead-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: buckethead-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for buckethead-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48a1abc3b4c45ea3495ec57275b6a3000fa31c85558113bac116b1780a0cf98d
MD5 c9d7e18f9e54802d97cf745ee4ecd924
BLAKE2b-256 4fe9cff1d024389a5bf5651b0bc2d5e5d4986f90d10db639eb7cea351f9055dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for buckethead-0.1.0-py3-none-any.whl:

Publisher: workflow.yaml on cloutfront/buckethead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page