In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage.
Project description
BucketHead
In-memory SQLite backed by periodic snapshots to S3-compatible bucket storage (AWS S3, Cloudflare R2, MinIO).
Your app writes to a regular sqlite3.Connection. BucketHead keeps the
database in memory, snapshots the whole thing to your bucket on a timer,
and restores it on startup. No Redis, no managed service — just SQLite
and one bucket.
Why
- Fast. Reads and writes hit an in-memory SQLite; microsecond latencies. BucketHead is never in the hot path.
- Durable. Snapshots land on S3 / R2 / MinIO at a configurable cadence, plus one final flush on shutdown.
- Cheap. Dirty-bit optimization skips the upload whenever the database hasn't changed since the last flush — so an idle workload costs nothing.
- Structured. You're using SQLite, not a hash table. Schemas, indexes, transactions, joins — all the usual stuff.
- R2-friendly. Zero egress fees + the dirty-bit mean snapshot cost is dominated by storage, not requests.
Install
uv add buckethead # runtime only
uv add 'buckethead[profiling]' # + memray / pyinstrument hooks
Python 3.13+.
Quickstart
from pathlib import Path
from buckethead import BucketConfig, BucketSQLite
cfg = BucketConfig.for_r2(
account_id="<cloudflare account id>",
bucket="my-bucket",
access_key_id="<r2 s3 api access key>",
secret_access_key="<r2 s3 api secret>",
)
with BucketSQLite(cfg) as bh:
# Raw SQL
bh.connection.execute("CREATE TABLE kv (k TEXT PRIMARY KEY, v TEXT)")
bh.connection.execute("INSERT INTO kv VALUES ('answer', '42')")
bh.connection.commit()
# Key/value interface
bh.kv.set("user/123", "alice")
bh.kv.get("user/123") # "alice"
# File store — content-addressable, dedup'd
bh_key = bh.files.put(Path("/tmp/big.bin"))
bh.files.get(bh_key, dest=Path("/tmp/out.bin"))
# On exit: final flush → snapshot uploaded to R2.
# On next startup: restored automatically.
Env-driven config
For 12-factor deployments, use BucketSettings:
from buckethead import BucketSettings, BucketSQLite
cfg = BucketSettings.from_env().to_bucket_config()
# reads R2_ACCOUNT_ID (or R2_ENDPOINT_URL), R2_BUCKET,
# R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and BUCKETHEAD_KEY
bh = BucketSQLite(cfg)
Three typed views on your data
| Attribute | Access pattern | What it's for |
|---|---|---|
bh.connection |
raw SQL | anything — full SQLite is yours |
bh.kv |
string-keyed set / get / dict protocol |
small configuration, cache entries, session data |
bh.files |
content-addressable SHA-256 bh-key → bytes in R2 |
arbitrary files (uploads, artifacts, ML inputs) |
Branches
BucketHead can maintain multiple named snapshots against the same bucket, one per "branch". Useful for trying risky migrations or running experiments without polluting main state.
bh.branches.create("experiment-1") # fork from current
bh.branches.switch("experiment-1") # flush outgoing, reload from target
bh.connection.execute("...") # writes go to experiment-1's snapshot
bh.branches.switch("main") # back to main
bh.branches.list() # ["experiment-1", "main"]
# Experiment succeeded — make main look like experiment-1:
bh.branches.switch("experiment-1")
bh.branches.overwrite("main") # identity becomes main; in-memory unchanged
bh.branches.delete("experiment-1") # optional cleanup
- Branches are R2 keys:
mainis<BucketConfig.key>; branchXis<BucketConfig.key>.branch.X. Listed via live ListObjectsV2 — no registry. - Names must match
[a-zA-Z0-9_-]+;mainis reserved. - FileStore blobs are shared across branches. Content-hash keys dedupe
automatically; deleting a branch does not delete blobs that were unique
to it. Run
bh.files.gc()for orphan cleanup. - Default branch at startup is
main; override viaBucketConfig(initial_branch="X")orBUCKETHEAD_BRANCH=X.
The connection lives in memory; kv rows and files metadata also live
in memory (they're SQLite tables). Only files blobs are stored outside
SQLite — one R2 object per blob, under a configurable files/ prefix.
CLI
buckethead inspect # schema + row counts of the snapshot
buckethead restore <local.db> # download the snapshot to disk
buckethead files list # paginated listing
buckethead files get <bh-key> <dest> # download one blob
buckethead files gc --dry-run # what would orphan cleanup delete?
buckethead files gc --grace-seconds 300 # actually clean up
All CLI commands read credentials from the same env vars as
BucketSettings.from_env().
Observability
Pass callbacks to BucketSQLite to wire metrics or tracing:
def on_flush_start() -> None: ...
def on_flush_complete(duration_s: float, bytes_uploaded: int) -> None: ...
def on_flush_error(exc: BaseException) -> None: ...
bh = BucketSQLite(
cfg,
on_flush_start=on_flush_start,
on_flush_complete=on_flush_complete,
on_flush_error=on_flush_error,
)
bytes_uploaded == 0 means the dirty-bit skipped the upload — the DB
didn't change since the last flush.
For deeper profiling, enable the built-in hooks:
from buckethead import BucketSQLite, ProfilingConfig
bh = BucketSQLite(
cfg,
profiling_config=ProfilingConfig(
io_counters=True, # JSON summary of bytes / ops per R2 call
memory=True, # requires buckethead[profiling]
cpu=True, # requires buckethead[profiling]
),
)
On bh.stop(), profiling artifacts are written under
ProfilingConfig.output_dir (default ./buckethead-profiles).
Configuration
BucketConfig
| Field | Default | Notes |
|---|---|---|
bucket |
required | bucket name |
key |
bucketsqlite/snap.db |
where the snapshot lives |
endpoint_url |
None |
set for R2/MinIO; leave None for AWS S3 |
region |
"auto" |
R2 wants "auto"; AWS wants a real region |
access_key_id / secret_access_key |
None |
or use IAM / env |
files_prefix |
"files/" |
FileStore objects go under this prefix |
For R2, use BucketConfig.for_r2(account_id, bucket, access_key_id, secret_access_key)
to skip the endpoint-URL boilerplate.
SnapshotConfig
| Field | Default | Notes |
|---|---|---|
interval_seconds |
60.0 |
background flush cadence |
min_interval_seconds |
5.0 |
debounce for flush() (manual) |
keep_previous |
True |
save <key>.prev before overwriting |
Env vars (for BucketSettings)
| Variable | Purpose |
|---|---|
R2_ACCOUNT_ID |
Cloudflare account id (used to build endpoint URL) |
R2_ENDPOINT_URL |
full endpoint; overrides R2_ACCOUNT_ID |
R2_BUCKET |
bucket name |
R2_ACCESS_KEY_ID |
R2 S3 API token access key |
R2_SECRET_ACCESS_KEY |
R2 S3 API token secret |
BUCKETHEAD_KEY |
snapshot key (default bucketsqlite/snap.db) |
BUCKETHEAD_BRANCH |
initial branch the process attaches to (default main) |
Both R2_ACCOUNT_ID and R2_ENDPOINT_URL are optional — if neither is
set, endpoint_url stays None and boto3 connects to AWS S3 with its
normal credential discovery.
Constraints and scope
- Single-process. The in-memory database lives in the process that
constructs
BucketSQLite. Multi-threaded access in that process is fine (bh.connect()vends a new connection per thread), but cross- process sharing is not supported. - Durability window. Hard crash (OOM, SIGKILL, power loss) loses
up to
interval_secondsof writes. Callbh.force_flush()after any write that must not be lost. - DB size. Snapshot wall-time is ~1 ms per MB of DB. Comfortable below 100 MB, usable up to ~500 MB, noticeable stalls above that.
- Not a Redis drop-in. No wire protocol, no pub/sub, no replication.
- Not a distributed database. Single writer, no HA.
Deeper reading
plan/project-spec.md— the designplan/build-plan.md— phased build log + decisionsdocs/diagrams.md— sequence diagrams
License
TBD.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file buckethead-0.1.2.tar.gz.
File metadata
- Download URL: buckethead-0.1.2.tar.gz
- Upload date:
- Size: 43.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07e398870aa31b0f197d0eaad2abba75cd7e1bd8b56a628a370108b4867ef335
|
|
| MD5 |
bc18d4568c73b7e5130dd8449ee6fb81
|
|
| BLAKE2b-256 |
dc4ecaef37100acf222959ce207e862e0fd4e2d3f0639903500bb01c5fd8230d
|
Provenance
The following attestation bundles were made for buckethead-0.1.2.tar.gz:
Publisher:
workflow.yaml on cloutfront/buckethead
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
buckethead-0.1.2.tar.gz -
Subject digest:
07e398870aa31b0f197d0eaad2abba75cd7e1bd8b56a628a370108b4867ef335 - Sigstore transparency entry: 1350808055
- Sigstore integration time:
-
Permalink:
cloutfront/buckethead@46fe7f95f7475cb8d39f56126abcb04eb1ee164c -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/cloutfront
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yaml@46fe7f95f7475cb8d39f56126abcb04eb1ee164c -
Trigger Event:
release
-
Statement type:
File details
Details for the file buckethead-0.1.2-py3-none-any.whl.
File metadata
- Download URL: buckethead-0.1.2-py3-none-any.whl
- Upload date:
- Size: 54.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5695dfd6588b0256f5e38bc4614e5425fbf091b5661c07d6bb70ffabcc0aa468
|
|
| MD5 |
bdf6487cb57ab90b6ec89d28cf0f46a7
|
|
| BLAKE2b-256 |
d6f80a7f061d46bcfe05983bfac0c113ec9aedf86640aa4878073db5ae61c4fa
|
Provenance
The following attestation bundles were made for buckethead-0.1.2-py3-none-any.whl:
Publisher:
workflow.yaml on cloutfront/buckethead
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
buckethead-0.1.2-py3-none-any.whl -
Subject digest:
5695dfd6588b0256f5e38bc4614e5425fbf091b5661c07d6bb70ffabcc0aa468 - Sigstore transparency entry: 1350808152
- Sigstore integration time:
-
Permalink:
cloutfront/buckethead@46fe7f95f7475cb8d39f56126abcb04eb1ee164c -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/cloutfront
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yaml@46fe7f95f7475cb8d39f56126abcb04eb1ee164c -
Trigger Event:
release
-
Statement type: