Skip to main content

SQLite-backed KV store (Rust+PyO3) for large media blobs

Project description

KohakuVault

High-performance, SQLite-backed storage with dual interfaces: dict-like for blobs (key-value) and list-like for sequences (columnar). Rust core with Pythonic APIs.

Quick Start

pip install kohakuvault

1. KV Store - Dict-like for Binary Blobs

from kohakuvault import KVault

# Basic operations
vault = KVault("data.db")
vault["image:123"] = image_bytes
vault["video:456"] = video_bytes

# Dict-like interface
if "image:123" in vault:
    data = vault["image:123"]

# Bulk operations with automatic caching
with vault.cache(64*1024*1024):  # 64MB cache
    for i in range(10000):
        vault[f"thumb:{i}"] = thumbnail_data
# Auto-flushes here!

# Streaming large files
with open("large_video.mp4", "rb") as f:
    vault.put_file("video:789", f)

2. Columnar - List-like for Typed Sequences

from kohakuvault import ColumnVault

# Create columnar storage (can share DB with KVault)
cv = ColumnVault("data.db")

# Fixed-size types
cv.create_column("temperatures", "f64")
cv.create_column("timestamps", "i64")

temps = cv["temperatures"]
temps.extend([23.5, 24.1, 25.0])  # Like a Python list
print(temps[0])      # 23.5
print(temps[-1])     # 25.0

# Variable-size bytes (strings, JSON, etc.)
cv.create_column("log_messages", "bytes")
logs = cv["log_messages"]
logs.append(b"Server started")
logs.append(b"Request processed in 5.2ms")

for msg in logs:
    print(msg.decode())

Features

  • Dual interfaces: Dict for blobs (KVault), List for sequences (ColumnVault)
  • Zero external dependencies: Single SQLite file, no services required
  • Memory efficient: Stream multi-GB files, dynamic chunk growth
  • Type-safe columnar: Fixed-size (i64, f64, bytes:N) and variable-size (bytes)
  • Rust performance: Native speed with Pythonic ergonomics
  • Smart caching: Auto-flush context manager, daemon thread, capacity enforcement

Best Practices

Handling Many Large Binary Files

For thousands of large binaries (images, videos, documents), use a hybrid approach:

from kohakuvault import KVault, ColumnVault

kv = KVault("media.db")
cv = ColumnVault(kv)  # Share same database

# Store metadata in columnar (efficient for large lists)
cv.create_column("image_ids", "i64")
cv.create_column("image_names", "bytes")
cv.create_column("image_sizes", "i64")
cv.create_column("upload_times", "i64")

ids = cv["image_ids"]
names = cv["image_names"]
sizes = cv["image_sizes"]
times = cv["upload_times"]

# Store actual binaries in KV store
for img_id, img_data, img_name in image_stream:
    # Metadata in columnar (fast append, efficient iteration/filtering)
    ids.append(img_id)
    names.append(img_name)
    sizes.append(len(img_data))
    times.append(int(time.time()))

    # Binary data in KV (optimized for large blobs)
    kv[f"blob:{img_id}"] = img_data

# Query metadata without loading binaries
for i in range(len(ids)):
    if sizes[i] > 1024 * 1024:  # Find images > 1MB
        print(f"Large image: {names[i].decode()}")
        # Load binary only when needed
        data = kv[f"blob:{ids[i]}"]

Why this pattern?

  • ✅ Columnar optimized for append-heavy metadata (millions of entries)
  • ✅ KV optimized for large binary blobs (streaming, caching)
  • ✅ Can query/filter metadata without loading binaries
  • ✅ Both share same SQLite file (single-file deployment)
  • ✅ Efficient iteration over metadata, lazy loading of binaries

Installation

pip install kohakuvault  # When published to PyPI
pip install .            # From source

Platform Support:

  • ✅ Linux (x86_64)
  • ✅ Windows (x86_64)
  • ✅ macOS (Apple Silicon M1/M2/M3/M4 only - ARM64)
  • ❌ macOS Intel (x86_64) - not supported

Development

Prerequisites: Python 3.10+, Rust (rustup.rs)

# Setup
git clone https://github.com/yourusername/kohakuvault.git
cd kohakuvault
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e .[dev]
maturin develop  # Build Rust extension (once)

# Workflow
# - Edit Python files → changes live immediately
# - Edit Rust files → run `maturin develop` to rebuild

# Tools
pytest                  # Run tests
black src/kohakuvault   # Format Python
cargo fmt               # Format Rust
maturin build --release # Build production wheel

Usage

Basic Operations

vault = KVault("media.db")

# Dict-like interface
vault["key"] = b"value"
data = vault["key"]
del vault["key"]
if "key" in vault: ...

# Safe retrieval
data = vault.get("key", default=b"")

# Iteration
for key in vault:
    print(f"{key}: {len(vault[key])} bytes")

Streaming Large Files

vault = KVault("media.db", chunk_size=1024*1024)  # 1 MiB chunks

# Stream from file → vault
with open("large_video.mp4", "rb") as f:
    vault.put_file("video:789", f)

# Stream from vault → file
with open("output.mp4", "wb") as f:
    vault.get_to_file("video:789", f)

Bulk Operations with Caching

Recommended: Use context manager for automatic flush

vault = KVault("media.db")

# Safest: Context manager auto-flushes
with vault.cache(cap_bytes=64*1024*1024):
    for i in range(1000):
        vault[f"item:{i}"] = data
# Auto-flushed here, guaranteed!

# Long-running: Daemon thread auto-flushes every 5 seconds
vault.enable_cache(cap_bytes=64*1024*1024, flush_interval=5.0)
while True:
    vault["sensor_data"] = read_sensor()
# Daemon flushes automatically

# Manual control (backward compatible)
vault.enable_cache(cap_bytes=64*1024*1024)
for i in range(1000):
    vault[f"item:{i}"] = data
vault.flush_cache()  # Manual flush
vault.disable_cache()  # Auto-flushes before disabling

Configuration

vault = KVault(
    path="media.db",
    chunk_size=2*1024*1024,   # Streaming chunk size
    retries=10,                # Retry attempts for busy DB
    enable_wal=True,           # Write-Ahead Logging
    cache_kb=20000,            # SQLite cache size
)

Columnar Storage (NEW!)

List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Fixed-size types: i64, f64, bytes:N
cv.create_column("sensor_temps", "f64")
cv.create_column("timestamps", "i64")
cv.create_column("hashes", "bytes:32")  # 32-byte fixed

temps = cv["sensor_temps"]
temps.append(23.5)
temps.extend([24.1, 25.0, 25.3])
print(temps[0], temps[-1], len(temps))  # 23.5, 25.3, 4

# Variable-size bytes (for strings, JSON, etc.)
cv.create_column("log_messages", "bytes")  # No size = variable!
logs = cv["log_messages"]
logs.append(b"Short message")
logs.append(b"This is a much longer log entry with details...")
print(logs[0])  # Exact bytes, no padding

# Iterate
for temp in temps:
    print(temp)

Why columnar?

  • Append-heavy workloads (O(1) amortized, like Python list)
  • Typed data (int/float/bytes with type safety)
  • Efficient iteration and random access
  • Dynamic chunk growth (128KB → 16MB, exponential like std::vector)
  • Cross-chunk element support (byte-based addressing)
  • Minimal memory overhead (incremental BLOB I/O)

See docs/COLUMNAR_GUIDE.md and examples/columnar_demo.py for complete guide.

API Reference

Constructor

KVault(path, chunk_size=1048576, retries=4, backoff_base=0.02,
       table="kvault", enable_wal=True, page_size=4096,
       mmap_size=268435456, cache_kb=20000)

Methods

Storage

  • put(key, value) - Store bytes
  • put_file(key, reader, size=None, chunk_size=None) - Stream from file-like
  • get(key, default=None) - Retrieve bytes
  • get_to_file(key, writer, chunk_size=None) - Stream to file-like
  • delete(key) - Remove key
  • exists(key) - Check existence

Caching

  • enable_cache(cap_bytes, flush_threshold) - Enable write-back cache
  • disable_cache() - Disable and flush cache
  • flush_cache() - Commit cached writes, returns count

Maintenance

  • optimize() - VACUUM database
  • close() - Flush and close

Dict Interface: vault[key], del vault[key], key in vault, len(vault), vault.keys(), vault.values(), vault.items(), etc.

Exceptions: KohakuVaultError, NotFound, DatabaseBusy, InvalidArgument, IoError

Architecture

Python wrapper (src/kohakuvault/proxy.py)
    ↓ PyO3 bindings
Rust core (src/kvault-rust/lib.rs)
    ↓ rusqlite
SQLite database (bundled)

Why hybrid? Rust handles SQLite operations safely and efficiently. Python provides the ergonomic dict-like interface.

Contributing

# Setup
git checkout -b feature-name
# Make changes
black src/kohakuvault && cargo fmt  # Format
pytest                               # Test
git commit && git push
# Open PR

Releasing

GitHub Actions automatically builds wheels and publishes to PyPI when you push a tag:

# 1. Update version in pyproject.toml and Cargo.toml
# 2. Commit changes
git add pyproject.toml Cargo.toml
git commit -m "Bump version to 0.1.0"

# 3. Create and push tag
git tag v0.1.0
git push origin main --tags

# 4. GitHub Actions will:
#    - Build wheels for all platforms
#    - Create GitHub Release with wheels attached
#    - Publish to PyPI (with skip-existing for safety)

What happens:

  • Wheels are built for Linux, Windows, macOS (Apple Silicon)
  • All wheels are uploaded to the GitHub Release (downloadable)
  • Wheels are published to PyPI
  • If some wheels already exist on PyPI, they're skipped (no error)

License

Apache 2.0 - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kohakuvault-0.2.2.tar.gz (78.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kohakuvault-0.2.2-cp313-cp313-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.13Windows x86-64

kohakuvault-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

kohakuvault-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

kohakuvault-0.2.2-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

kohakuvault-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

kohakuvault-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kohakuvault-0.2.2-cp311-cp311-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.11Windows x86-64

kohakuvault-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

kohakuvault-0.2.2-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kohakuvault-0.2.2-cp310-cp310-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.10Windows x86-64

kohakuvault-0.2.2-cp310-cp310-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

kohakuvault-0.2.2-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file kohakuvault-0.2.2.tar.gz.

File metadata

  • Download URL: kohakuvault-0.2.2.tar.gz
  • Upload date:
  • Size: 78.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kohakuvault-0.2.2.tar.gz
Algorithm Hash digest
SHA256 a64cd16b032445a765c8853b2258578a552ba3b3789f95f811684bce59335666
MD5 2fb89aea7bda655c63161832e316dc16
BLAKE2b-256 e2ba975ef0243c855d80ddd67d5733e017b9e0b4a3f0767fb16704fce5b15d95

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2.tar.gz:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3ea879165c767a77487ffe6b2a896719e8eccd09b794083542311cc099ae7f70
MD5 e9177ccfd89b0972eae3db23599215c1
BLAKE2b-256 508e744486030dfbfe6d25b15660e9aafe8afd29c6b4ed6225b59db077802d44

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp313-cp313-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6ef128e9f799293a0eec4f4cca62c212f037e176108d1c9a972fe03698506a15
MD5 a082219729a89f19aebf8af5de7c6da1
BLAKE2b-256 000a36ac3631bb26149712d62b35bb182fc8914278c6727fa6464da59388076a

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ce582b3dd84a00782778660aa94f061f3258773c475ebb73c927d2c4e24803bb
MD5 9d9fdf2110a62cd53cfce2b446e4e75d
BLAKE2b-256 30205136fffb4afde3a9d09a262e2acc148944ebe853cfa88711a2217559a47b

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 fd2a176833b605e930e1ea897d0317d0eb6f26bd7a66662bfd5c0eaf62412874
MD5 9ad35f4a986267681598d0a45db95160
BLAKE2b-256 95ac7d78f1eb421d5f6004a008f0f1c66544339f5336791dadb6d7150676bbb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 60bbe0d57d78df245d930e013534547acf466c5a6c6588517ca1d44a6f96b3c1
MD5 66d2b80b223babb5049961434144ae68
BLAKE2b-256 6e7da494a7fa82a16d0e220e46ac13f3648a866077cfba301280c14193ede63c

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1a709d1c1f1e24cf5af92fe5d506b088e96475b995425076a09ef3a22b2b70a7
MD5 4248c6baba10a5ff1f323d6f6526a82e
BLAKE2b-256 62cb8b27966ef4a5b58ba8ecaa03c0c7b69d94e4d492e2a1f657a0c17eee2631

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 642a66c83155e9e124683cec17a1d43f551a2b17f27f30e057945e47ff91f159
MD5 74b80f60822c9c5f2b4177f1090739b6
BLAKE2b-256 8c25e17cc60ca6cff2f169007c4ea4288473e34f3b201e13c14058dfd407114d

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp311-cp311-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 92dec40db7eb4e23b60bfde8b64b8d1c1daa7b33f0f7e6c08e8ab1f09e2c46cf
MD5 4e306d270f8e407f516057a1d5db39f0
BLAKE2b-256 1154e4f133df56312e0aed4fdf89e815a01002a8fc8bda02fdd35a1a9a28970d

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4f02464433ee9bb2165f46f4287561163c0b7abf150f1b5dd8627c51440d5dcd
MD5 ea7c24585744747bd38f293723f4ac16
BLAKE2b-256 72b2bb4724e8dab1c879c6eb179c9bad93042a6ed269821f47db0a21af96cf26

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a054b179a1ca866ee8d199a23b52e169d331cd9284f455bf22bc268d34187f06
MD5 54d22ca044c1d9a597b59f03ebc013de
BLAKE2b-256 ba06e22478316d5a407294886abd07c2fa8692a0d7f21fdb826ff06821cc5c60

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp310-cp310-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bdbdde869967f0c8f57113fe81d4562a8082a1acd97bdf0962d62bcd04d953de
MD5 4bfcb602f9e6cef8db31086cea8fe8eb
BLAKE2b-256 53f735463ec70f003244298b2e628378abd22d78410492a131ee1b4f0fe26b8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.2.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.2.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8d178b219c1abf21a9be615d695d9724a6f8d31cbd1771e3f72aee6361f8d6da
MD5 cc66dd6c2e50cfc24393958e2f2c5a2a
BLAKE2b-256 f3cfdd851ec4bdb1d3e0919ee09e4894b5948f187c09e80cb293f45733f27670

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.2.2-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page