Skip to main content

Distributed SQLite-compatible storage engine backed by S3

Project description

distributed-sqlite

A distributed SQLite-compatible storage engine backed solely by AWS S3.

Overview

distributed-sqlite provides a standard SQLAlchemy/DBAPI2 interface over an append-only, segment-based storage model on S3. It supports:

  • Snapshot isolation — each transaction reads from a consistent snapshot
  • Optimistic concurrency — CAS-based manifest commits with automatic retry
  • Conflict detection — write-set intersection check; raises ConflictError on true conflicts
  • Exponential backoff with jitter — full jitter retry up to 10 attempts
  • WAL-like semantics — immutable segments + versioned manifests, never mutates committed data
  • Crash recovery — orphaned segments (written but not committed) are detected and safely ignored
  • Alembic migrations — Alembic sees a standard SQLite interface; all DDL and migration ops work unchanged
  • Local caching — LRU disk cache for segments, in-memory snapshot cache

Storage Layout

{bucket}/{prefix}/
  manifests/v{N:020d}.json   # Immutable manifest per version
  segments/{uuid}.seg        # Immutable append-only segments (msgpack)
  root.json                  # Eventually-consistent version hint

Connection URL

distributed_sqlite+distributed_sqlite:///<bucket>/<prefix>

Quick Start

from distributed_sqlite.engine import bootstrap, open_connection, create_engine

# Initialize the store (idempotent)
bootstrap("my-bucket", "mydb")

# Raw DBAPI2 connection
with open_connection("my-bucket", "mydb") as conn:
    cur = conn.cursor()
    cur.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
    cur.execute("INSERT INTO users VALUES (1, 'Alice')")
    conn.commit()

# SQLAlchemy engine
import sqlalchemy as sa
engine = create_engine("distributed_sqlite+distributed_sqlite:///my-bucket/mydb")

boto3 sessions (STS, LocalStack, custom credential chains)

Pass a boto3.Session so the library never has to rely on mutating process environment (AWS_ACCESS_KEY_ID, etc.). The S3 client is built as session.client("s3", endpoint_url=...).

import boto3
from distributed_sqlite.engine import bootstrap, create_engine

session = boto3.Session(
    aws_access_key_id="...",
    aws_secret_access_key="...",
    aws_session_token="...",  # from STS
    region_name="us-east-1",
)
bootstrap("my-bucket", "mydb", boto3_session=session)
engine = create_engine(
    "distributed_sqlite+distributed_sqlite:///my-bucket/mydb",
    endpoint_url=None,  # or your LocalStack / MinIO URL
    boto3_session=session,
)

The same boto3_session= argument works on open_connection(), bootstrap(), recovery_scan(), and S3Backend(...).

If you use sqlalchemy.create_engine directly, pass the session in connect_args:

sa.create_engine(
    "distributed_sqlite+distributed_sqlite:///my-bucket/mydb",
    connect_args={"boto3_session": session, "endpoint_url": "http://localhost:4566"},
)

Long-lived processes: botocore refreshes credentials automatically when the session uses a refreshable provider (e.g. AssumeRole). If you hold temporary static keys (GetSessionToken) until expiry, obtain a new session before the expiry time and open a new engine/connection with it; cached clients on an existing S3Backend do not pick up swapped credentials.

Environment Variables

Variable Default Description
AWS_ACCESS_KEY_ID AWS credentials
AWS_SECRET_ACCESS_KEY AWS credentials
AWS_DEFAULT_REGION us-east-1 AWS region
AWS_ENDPOINT_URL Custom endpoint (LocalStack, MinIO)
DISTRIBUTED_SQLITE_CACHE_DIR ~/.distributed_sqlite/cache Local cache directory
DISTRIBUTED_SQLITE_CHECKPOINT_INTERVAL 50 Delta segments between checkpoints
DISTRIBUTED_SQLITE_MAX_RETRIES 10 Max commit retry attempts
DISTRIBUTED_SQLITE_RETRY_BASE_SECONDS 0.05 Backoff base delay
DISTRIBUTED_SQLITE_RETRY_MAX_SECONDS 30.0 Max backoff delay

Architecture

See docs/architecture.md for the full design narrative.

Development

cp .env.example .env   # fill in your AWS credentials
uv sync
uv run pytest tests/ -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distributed_sqlite-0.3.0.tar.gz (95.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distributed_sqlite-0.3.0-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file distributed_sqlite-0.3.0.tar.gz.

File metadata

  • Download URL: distributed_sqlite-0.3.0.tar.gz
  • Upload date:
  • Size: 95.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for distributed_sqlite-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ac0354f167a76359c405b39ac8ff58673f2a76510fdb4e4e6f30264df4edd726
MD5 fa878329e998a0b4fe379e78279030fa
BLAKE2b-256 692f203dfff2cec5e5fd922fb9c991b7529519a70073f699019e6044579a52f4

See more details on using hashes here.

File details

Details for the file distributed_sqlite-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for distributed_sqlite-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2fef54c296f5db95180e4019d11a18b0224464264b21d380efc3c5fbb7425ecd
MD5 b21a9de729a9b1cbb9b3fe9ec9ff62a7
BLAKE2b-256 ec2731e968611e41232d787089d47168918b8187c8f4e718d1c19182a74cd899

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page