In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage.

Project description

BucketHead

Migrating to 0.4.0? The internal layout was reorganized into feature subpackages. The public surface (from buckethead import ...) is unchanged, but direct submodule imports moved:

Old New

buckethead.bucket, .snapshot, .lifecycle, .branches buckethead.storage.*

buckethead.filestore, .local_files buckethead.files.* (store, tracker)

buckethead.sharing (module), .sharing_cli buckethead.sharing.service, buckethead.sharing.cli

buckethead.profiling, .status buckethead.observability.*

buckethead.settings, .user_config, .secret_refs buckethead.config.* (settings, user, secret_refs)

If you only ever did from buckethead import BucketSQLite, BucketConfig, ..., nothing changes. This note will be removed in 0.5.0.

Old	New
`buckethead.bucket`, `.snapshot`, `.lifecycle`, `.branches`	`buckethead.storage.*`
`buckethead.filestore`, `.local_files`	`buckethead.files.*` (`store`, `tracker`)
`buckethead.sharing` (module), `.sharing_cli`	`buckethead.sharing.service`, `buckethead.sharing.cli`
`buckethead.profiling`, `.status`	`buckethead.observability.*`
`buckethead.settings`, `.user_config`, `.secret_refs`	`buckethead.config.*` (`settings`, `user`, `secret_refs`)

In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage (AWS S3, Cloudflare R2, MinIO).

Your app writes to a regular sqlite3.Connection. BucketHead keeps the database in memory, snapshots the whole thing to your bucket on a timer, and restores it on startup. No Redis, no managed service — just SQLite and one bucket.

Why

Fast. Reads and writes hit an in-memory SQLite; microsecond latencies. BucketHead is never in the hot path.
Durable. Snapshots land on S3 / R2 / MinIO at a configurable cadence, plus one final flush on shutdown.
Cheap. Dirty-bit optimization skips the upload whenever the database hasn't changed since the last flush — so an idle workload costs nothing.
Structured. You're using SQLite, not a hash table. Schemas, indexes, transactions, joins — all the usual stuff.
R2-friendly. Zero egress fees + the dirty-bit mean snapshot cost is dominated by storage, not requests.

Install

uv add buckethead                # runtime only
uv add 'buckethead[profiling]'   # + memray / pyinstrument hooks

Python 3.13+.

Quickstart

from pathlib import Path
from buckethead import BucketConfig, BucketSQLite

cfg = BucketConfig.for_r2(
    account_id="<cloudflare account id>",
    bucket="my-bucket",
    access_key_id="<r2 s3 api access key>",
    secret_access_key="<r2 s3 api secret>",
)

with BucketSQLite(cfg) as bh:
    # Raw SQL
    bh.connection.execute("CREATE TABLE kv (k TEXT PRIMARY KEY, v TEXT)")
    bh.connection.execute("INSERT INTO kv VALUES ('answer', '42')")
    bh.connection.commit()

    # Key/value interface
    bh.kv.set("user/123", "alice")
    bh.kv.get("user/123")                      # "alice"

    # File store — content-addressable, dedup'd
    bh_key = bh.files.put(Path("/tmp/big.bin"))
    bh.files.get(bh_key, dest=Path("/tmp/out.bin"))

# On exit: final flush → snapshot uploaded to R2.
# On next startup: restored automatically.

Env-driven config

For 12-factor deployments, skip the BucketConfig entirely — BucketSQLite() auto-loads it from BUCKETHEAD_* env vars and ~/.config/buckethead/config.toml:

from buckethead import BucketSQLite

bh = BucketSQLite()
# reads BUCKETHEAD_BUCKET__NAME, BUCKETHEAD_BUCKET__ACCESS_KEY_ID,
# BUCKETHEAD_BUCKET__SECRET_ACCESS_KEY, and optionally
# BUCKETHEAD_BUCKET__ENDPOINT_URL or BUCKETHEAD_CLOUDFLARE__ACCOUNT_ID.
# [env] entries in the user config TOML are merged in via setdefault.

If you need the intermediate BucketConfig object, build it explicitly via BucketHeadSettings().to_bucket_config() and pass it in.

Four typed views on your data

Attribute	Access pattern	What it's for
`bh.connection`	raw SQL	anything — full SQLite is yours
`bh.kv`	string-keyed `set` / `get` / dict protocol	small configuration, cache entries, session data
`bh.docs`	named collections of JSON documents with a Mongo-lite filter DSL	structured-ish records you want to query by field (users, events, config blobs)
`bh.files`	content-addressable SHA-256 `bh-key` → bytes in R2	arbitrary files (uploads, artifacts, ML inputs)

users = bh.docs.collection("users")
users.insert({"name": "alice", "age": 30, "tags": ["beta"]})
users.find({"age": {"$gte": 18}, "tags": {"$in": ["beta"]}})

DocStore rows live in SQLite — they snapshot and branch with the rest of the database. Escape hatch for queries the DSL doesn't cover: bh.connection.execute("SELECT doc FROM bh_docs WHERE ...") with json_extract.

Branches

BucketHead can maintain multiple named snapshots against the same bucket, one per "branch". Useful for trying risky migrations or running experiments without polluting main state.

bh.branches.create("experiment-1")      # fork from current
bh.branches.switch("experiment-1")      # flush outgoing, reload from target
bh.connection.execute("...")             # writes go to experiment-1's snapshot
bh.branches.switch("main")              # back to main
bh.branches.list()                      # ["experiment-1", "main"]

# Experiment succeeded — make main look like experiment-1:
bh.branches.switch("experiment-1")
bh.branches.overwrite("main")           # identity becomes main; in-memory unchanged
bh.branches.delete("experiment-1")      # optional cleanup

Branches are R2 keys: main is <BucketConfig.key>; branch X is <BucketConfig.key>.branch.X. Listed via live ListObjectsV2 — no registry.
Names must match [a-zA-Z0-9_-]+; main is reserved.
FileStore blobs are shared across branches. Content-hash keys dedupe automatically; deleting a branch does not delete blobs that were unique to it. Run bh.files.gc() for orphan cleanup.
Default branch at startup is main; override via BucketConfig(initial_branch="X") or BUCKETHEAD_SNAPSHOT__BRANCH=X.

The connection lives in memory; kv rows and files metadata also live in memory (they're SQLite tables). Only files blobs are stored outside SQLite — one R2 object per blob, under a configurable files/ prefix.

Tracking files on disk

LocalFileTracker keeps local filesystem paths in sync with FileStore blobs and retains a full version history per path. Each sync() that detects changed content appends a FileVersion row — nothing is ever overwritten in place.

from pathlib import Path
from buckethead import BucketSQLite, LocalFileTracker

with BucketSQLite(cfg) as bh:
    tracker = LocalFileTracker(bh.connection, bh.files)

    # Initial track — hashes the file, uploads the blob, records version 1.
    tracker.track(Path("/etc/app/settings.json"))

    # Periodically (or on demand) — re-hash every tracked path,
    # append a new version if anything changed.
    report = tracker.sync()
    # SyncReport(scanned=1, unchanged=0, updated=1, missing=[])

    # Which bh-key does a path currently point at?
    tracker.current(Path("/etc/app/settings.json"))

    # Full history, newest first.
    for v in tracker.history(Path("/etc/app/settings.json")):
        print(v.synced_at, v.bh_key, v.size)

Metadata lives in bh_local_files (one row per tracked path) and bh_local_file_versions (append-only history). Both tables snapshot with the rest of the database, so the version log survives restarts and travels across branches. Blobs themselves live in FileStore under content-addressable keys — identical content across different paths or branches dedupes automatically.

Sharing files

A FileStore blob can be exposed to the outside world through a separate share bucket — either a public-read R2 bucket where the URL is a stable public path, or a private bucket where the URL is a sig-v4 presigned GET. Provision it once per project:

buckethead provision share-bucket --project my-project

Then point BucketSQLite at the project by name:

from buckethead import BucketSQLite

bh = BucketSQLite(project="my-project")
bh.start()

bh_key = bh.files.put(b"hello", filename="note.txt")
result = bh.shares.share(bh_key)    # copies into share bucket
print(result.url)                   # https://files.example.com/note/abcd1234.txt

project= reads the share bucket name from ~/.config/buckethead/config.toml and pulls bucket credentials from the configured secret store (the same conventions buckethead shares uses). Projects without a share bucket attached leave bh.shares raising, so you only pay for the lookup when sharing is actually configured. Build the ShareConfig standalone via ShareConfig.from_project("my-project") if you'd rather compose it yourself.

CLI

buckethead status                          # probe bucket + current snapshot key, no hydration
buckethead inspect                         # schema + row counts of the snapshot
buckethead restore <local.db>              # download the snapshot to disk

buckethead files list                      # paginated listing
buckethead files get <bh-key> <dest>       # download one blob
buckethead files gc --dry-run              # what would orphan cleanup delete?
buckethead files gc --grace-seconds 300    # actually clean up

buckethead provision bucket --project <n>  # one-time: create bucket + store creds
buckethead shares share <bh-key> --project <n>  # copy to share bucket, print URL
buckethead config show                     # inspect ~/.config/buckethead/config.toml
buckethead bench run <preset>              # KV latency / throughput / YCSB
buckethead stress run <scenario>           # cost + perf scenarios against real R2

All CLI commands read credentials from the same env vars as BucketHeadSettings(). See the CLI reference for the full surface — provision, shares, config, bench, and stress each have sub-commands beyond the ones shown above.

Observability

Pass callbacks to BucketSQLite to wire metrics or tracing:

def on_flush_start() -> None: ...
def on_flush_complete(duration_s: float, bytes_uploaded: int) -> None: ...
def on_flush_error(exc: BaseException) -> None: ...

bh = BucketSQLite(
    cfg,
    on_flush_start=on_flush_start,
    on_flush_complete=on_flush_complete,
    on_flush_error=on_flush_error,
)

bytes_uploaded == 0 means the dirty-bit skipped the upload — the DB didn't change since the last flush.

For deeper profiling, enable the built-in hooks:

from buckethead import BucketSQLite, ProfilingConfig

bh = BucketSQLite(
    cfg,
    profiling_config=ProfilingConfig(
        io_counters=True,     # JSON summary of bytes / ops per R2 call
        memory=True,          # requires buckethead[profiling]
        cpu=True,             # requires buckethead[profiling]
    ),
)

On bh.stop(), profiling artifacts are written under ProfilingConfig.output_dir (default ./buckethead-profiles).

Configuration

`BucketConfig`

Field	Default	Notes
`bucket`	required	bucket name
`key`	`bucketsqlite/snap.db`	where the snapshot lives
`endpoint_url`	`None`	set for R2/MinIO; leave `None` for AWS S3
`region`	`"auto"`	R2 wants `"auto"`; AWS wants a real region
`access_key_id` / `secret_access_key`	`None`	or use IAM / env
`files_prefix`	`"files/"`	FileStore objects go under this prefix

For R2, use BucketConfig.for_r2(account_id, bucket, access_key_id, secret_access_key) to skip the endpoint-URL boilerplate.

`SnapshotConfig`

Field	Default	Notes
`interval_seconds`	`60.0`	background flush cadence
`min_interval_seconds`	`5.0`	debounce for `flush()` (manual)
`keep_previous`	`True`	save `<key>.prev` before overwriting

Env vars (for `BucketHeadSettings`)

All vars use the BUCKETHEAD_ prefix with __ as the nesting delimiter (pydantic-settings convention). So BUCKETHEAD_BUCKET__NAME lands in settings.bucket.name.

Variable	Purpose
`BUCKETHEAD_BUCKET__NAME`	bucket name (required)
`BUCKETHEAD_BUCKET__ACCESS_KEY_ID`	S3-API access key (required)
`BUCKETHEAD_BUCKET__SECRET_ACCESS_KEY`	S3-API secret (required)
`BUCKETHEAD_BUCKET__ENDPOINT_URL`	full endpoint; set for R2/MinIO/B2
`BUCKETHEAD_BUCKET__REGION`	defaults to `auto` (R2); set a real region for AWS
`BUCKETHEAD_CLOUDFLARE__ACCOUNT_ID`	alternative to `ENDPOINT_URL` for R2 — endpoint is derived when `BUCKETHEAD_CLOUD=cloudflare-r2` (default)
`BUCKETHEAD_SNAPSHOT__KEY`	snapshot key (default `bucketsqlite/snap.db`)
`BUCKETHEAD_SNAPSHOT__BRANCH`	initial branch the process attaches to (default `main`)
`BUCKETHEAD_CLOUD`	cloud backend name; default `cloudflare-r2`
`BUCKETHEAD_SECRET_STORE`	secret-store backend name; default `1password`

Both BUCKETHEAD_BUCKET__ENDPOINT_URL and BUCKETHEAD_CLOUDFLARE__ACCOUNT_ID are optional — if neither is set, endpoint_url stays None and boto3 connects to AWS S3 with its normal credential discovery.

Constraints and scope

Single-process. The in-memory database lives in the process that constructs BucketSQLite. Multi-threaded access in that process is fine (bh.connect() vends a new connection per thread), but cross- process sharing is not supported.
Durability window. Hard crash (OOM, SIGKILL, power loss) loses up to interval_seconds of writes. Call bh.force_flush() after any write that must not be lost.
DB size. Snapshot wall-time is ~1 ms per MB of DB. Comfortable below 100 MB, usable up to ~500 MB, noticeable stalls above that.
Not a Redis drop-in. No wire protocol, no pub/sub, no replication.
Not a distributed database. Single writer, no HA.

Deeper reading

docs/diagrams.md — sequence diagrams for the lifecycle, FileStore.put, and FileStore.gc flows
Full docs site — API reference and CLI usage

License

TBD.

Project details

Release history Release notifications | RSS feed

0.4.1

Apr 27, 2026

This version

0.4.0

Apr 27, 2026

0.3.1

Apr 24, 2026

0.3.0

Apr 24, 2026

0.2.0

Apr 24, 2026

0.1.2

Apr 21, 2026

0.1.1

Apr 21, 2026

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buckethead-0.4.0.tar.gz (73.5 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

buckethead-0.4.0-py3-none-any.whl (94.8 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file buckethead-0.4.0.tar.gz.

File metadata

Download URL: buckethead-0.4.0.tar.gz
Upload date: Apr 27, 2026
Size: 73.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for buckethead-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`3919cff93cd1e2705ab72ba4dbbbf74439d705a29aa4cfd4e5f934517e827a1a`
MD5	`f7b9c585dad3116d8ca58d3f81bfc755`
BLAKE2b-256	`15af91bceaa6d0cf358a521bc8dcbcc95380df7d38d2a5c305f30d33a5f475b1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for buckethead-0.4.0.tar.gz:

Publisher: workflow.yaml on cloutfront/buckethead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: buckethead-0.4.0.tar.gz
- Subject digest: 3919cff93cd1e2705ab72ba4dbbbf74439d705a29aa4cfd4e5f934517e827a1a
- Sigstore transparency entry: 1394641513
- Sigstore integration time: Apr 27, 2026
Source repository:
- Permalink: cloutfront/buckethead@132ef2fad9c932f24aa62a8229fc58d4a17bce40
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/cloutfront
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@132ef2fad9c932f24aa62a8229fc58d4a17bce40
- Trigger Event: release

File details

Details for the file buckethead-0.4.0-py3-none-any.whl.

File metadata

Download URL: buckethead-0.4.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 94.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for buckethead-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`76d04169e4beb68ecca4c2ca072886d5202af73d127763d94771b82f9098b673`
MD5	`3d3685ba96e457e5c1f71307cb2ad079`
BLAKE2b-256	`b72af8028233328a0105e3bb41f2e8033daaf61d630c57dd3fd38ff6b3805de1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for buckethead-0.4.0-py3-none-any.whl:

Publisher: workflow.yaml on cloutfront/buckethead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: buckethead-0.4.0-py3-none-any.whl
- Subject digest: 76d04169e4beb68ecca4c2ca072886d5202af73d127763d94771b82f9098b673
- Sigstore transparency entry: 1394641522
- Sigstore integration time: Apr 27, 2026
Source repository:
- Permalink: cloutfront/buckethead@132ef2fad9c932f24aa62a8229fc58d4a17bce40
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/cloutfront
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@132ef2fad9c932f24aa62a8229fc58d4a17bce40
- Trigger Event: release

buckethead 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

BucketHead

Why

Install

Quickstart

Env-driven config

Four typed views on your data

Branches

Tracking files on disk

Sharing files

CLI

Observability

Configuration

BucketConfig

SnapshotConfig

Env vars (for BucketHeadSettings)

Constraints and scope

Deeper reading

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`BucketConfig`

`SnapshotConfig`

Env vars (for `BucketHeadSettings`)