Skip to main content

SQLite-backed KV store (Rust+PyO3) for large media blobs with structured data support

Project description

KohakuVault

High-performance, SQLite-backed storage with dual interfaces: dict-like for blobs (key-value) and list-like for sequences (columnar). Rust core with Pythonic APIs.

Quick Start

pip install kohakuvault

KV Store - Dict-like interface for binary blobs (images, videos, documents):

from kohakuvault import KVault

vault = KVault("data.db")
vault["image:123"] = image_bytes
vault["video:456"] = video_bytes
data = vault["image:123"]

# Bulk writes with smart caching (NEW in v0.2.2!)
with vault.cache(64*1024*1024):
    for i in range(10000):
        vault[f"key:{i}"] = data
# Auto-flushes on exit!

Columnar Storage - List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Primitives
cv.create_column("temperatures", "f64")
temps = cv["temperatures"]
temps.extend([23.5, 24.1, 25.0])
print(temps[0])  # 23.5

# Structured data (NEW in v0.3.0!)
cv.create_column("users", "msgpack")
users = cv["users"]
users.append({"name": "Alice", "age": 30, "tags": ["vip"]})
print(users[0])  # {'name': 'Alice', 'age': 30, 'tags': ['vip']}

# Strings with encoding (NEW in v0.3.0!)
cv.create_column("messages", "str:utf8")
messages = cv["messages"]
messages.append("Hello, 世界!")
print(messages[0])  # 'Hello, 世界!'

DataPacker - Rust-based serialization (NEW in v0.3.0!):

from kohakuvault import DataPacker

# MessagePack for structured data
packer = DataPacker("msgpack")
packed = packer.pack({"user": "alice", "score": 95.5})
data = packer.unpack(packed, 0)

# Bulk operations
records = [{"id": i, "val": i*1.5} for i in range(1000)]
packed_all = packer.pack_many(records)  # Concatenated bytes

# Unpack with offsets (for variable-size)
offsets = [0, len(packer.pack(records[0]))]  # Calculate offsets
unpacked = packer.unpack_many(packed_all, offsets=offsets)

Features

  • Dual interfaces: Dict for blobs (KVault), List for sequences (ColumnVault)
  • Zero external dependencies: Single SQLite file, no services required
  • Memory efficient: Stream multi-GB files, dynamic chunk growth
  • Type-safe columnar: Fixed-size (i64, f64, bytes:N) and variable-size (bytes, str, msgpack, cbor)
  • Rust performance: Native speed with Pythonic ergonomics
  • Smart caching: Auto-flush context manager, daemon thread, capacity enforcement
  • Structured data: Store dicts/lists directly with MessagePack/CBOR (NEW in v0.3.0!)
  • DataPacker: Rust-based serialization with multi-encoding support (NEW in v0.3.0!)

Best Practices

Handling Many Large Binary Files

For thousands of large binaries (images, videos, documents), use a hybrid approach:

from kohakuvault import KVault, ColumnVault

kv = KVault("media.db")
cv = ColumnVault(kv)  # Share same database

# Store metadata in columnar (efficient for large lists)
cv.create_column("image_ids", "i64")
cv.create_column("image_names", "bytes")
cv.create_column("image_sizes", "i64")
cv.create_column("upload_times", "i64")

ids = cv["image_ids"]
names = cv["image_names"]
sizes = cv["image_sizes"]
times = cv["upload_times"]

# Store actual binaries in KV store
for img_id, img_data, img_name in image_stream:
    # Metadata in columnar (fast append, efficient iteration/filtering)
    ids.append(img_id)
    names.append(img_name)
    sizes.append(len(img_data))
    times.append(int(time.time()))

    # Binary data in KV (optimized for large blobs)
    kv[f"blob:{img_id}"] = img_data

# Query metadata without loading binaries
for i in range(len(ids)):
    if sizes[i] > 1024 * 1024:  # Find images > 1MB
        print(f"Large image: {names[i].decode()}")
        # Load binary only when needed
        data = kv[f"blob:{ids[i]}"]

Why this pattern?

  • ✅ Columnar optimized for append-heavy metadata (millions of entries)
  • ✅ KV optimized for large binary blobs (streaming, caching)
  • ✅ Can query/filter metadata without loading binaries
  • ✅ Both share same SQLite file (single-file deployment)
  • ✅ Efficient iteration over metadata, lazy loading of binaries

Installation

pip install kohakuvault  # When published to PyPI
pip install .            # From source

Platform Support:

  • ✅ Linux (x86_64)
  • ✅ Windows (x86_64)
  • ✅ macOS (Apple Silicon M1/M2/M3/M4 only - ARM64)
  • ❌ macOS Intel (x86_64) - not supported

Development

Prerequisites: Python 3.10+, Rust (rustup.rs)

# Setup
git clone https://github.com/yourusername/kohakuvault.git
cd kohakuvault
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e .[dev]
maturin develop  # Build Rust extension (once)

# Workflow
# - Edit Python files → changes live immediately
# - Edit Rust files → run `maturin develop` to rebuild

# Tools
pytest                  # Run tests
black src/kohakuvault   # Format Python
cargo fmt               # Format Rust
maturin build --release # Build production wheel

Usage

Basic Operations

vault = KVault("media.db")

# Dict-like interface
vault["key"] = b"value"
data = vault["key"]
del vault["key"]
if "key" in vault: ...

# Safe retrieval
data = vault.get("key", default=b"")

# Iteration
for key in vault:
    print(f"{key}: {len(vault[key])} bytes")

Streaming Large Files

vault = KVault("media.db", chunk_size=1024*1024)  # 1 MiB chunks

# Stream from file → vault
with open("large_video.mp4", "rb") as f:
    vault.put_file("video:789", f)

# Stream from vault → file
with open("output.mp4", "wb") as f:
    vault.get_to_file("video:789", f)

Bulk Operations with Caching

Recommended: Use context manager for automatic flush

vault = KVault("media.db")

# Safest: Context manager auto-flushes
with vault.cache(cap_bytes=64*1024*1024):
    for i in range(1000):
        vault[f"item:{i}"] = data
# Auto-flushed here, guaranteed!

# Long-running: Daemon thread auto-flushes every 5 seconds
vault.enable_cache(cap_bytes=64*1024*1024, flush_interval=5.0)
while True:
    vault["sensor_data"] = read_sensor()
# Daemon flushes automatically

# Manual control (backward compatible)
vault.enable_cache(cap_bytes=64*1024*1024)
for i in range(1000):
    vault[f"item:{i}"] = data
vault.flush_cache()  # Manual flush
vault.disable_cache()  # Auto-flushes before disabling

Configuration

vault = KVault(
    path="media.db",
    chunk_size=2*1024*1024,   # Streaming chunk size
    retries=10,                # Retry attempts for busy DB
    enable_wal=True,           # Write-Ahead Logging
    cache_kb=20000,            # SQLite cache size
)

Columnar Storage (NEW!)

List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Fixed-size types: i64, f64, bytes:N
cv.create_column("sensor_temps", "f64")
cv.create_column("timestamps", "i64")
cv.create_column("hashes", "bytes:32")  # 32-byte fixed

temps = cv["sensor_temps"]
temps.append(23.5)
temps.extend([24.1, 25.0, 25.3])
print(temps[0], temps[-1], len(temps))  # 23.5, 25.3, 4

# Variable-size bytes (for strings, JSON, etc.)
cv.create_column("log_messages", "bytes")  # No size = variable!
logs = cv["log_messages"]
logs.append(b"Short message")
logs.append(b"This is a much longer log entry with details...")
print(logs[0])  # Exact bytes, no padding

# Iterate
for temp in temps:
    print(temp)

Why columnar?

  • Append-heavy workloads (O(1) amortized, like Python list)
  • Typed data (int/float/bytes with type safety)
  • Efficient iteration and random access
  • Dynamic chunk growth (128KB → 16MB, exponential like std::vector)
  • Cross-chunk element support (byte-based addressing)
  • Minimal memory overhead (incremental BLOB I/O)

See docs/COLUMNAR_GUIDE.md and examples/columnar_demo.py for complete guide.

API Reference

Constructor

KVault(path, chunk_size=1048576, retries=4, backoff_base=0.02,
       table="kvault", enable_wal=True, page_size=4096,
       mmap_size=268435456, cache_kb=20000)

Methods

Storage

  • put(key, value) - Store bytes
  • put_file(key, reader, size=None, chunk_size=None) - Stream from file-like
  • get(key, default=None) - Retrieve bytes
  • get_to_file(key, writer, chunk_size=None) - Stream to file-like
  • delete(key) - Remove key
  • exists(key) - Check existence

Caching

  • enable_cache(cap_bytes, flush_threshold) - Enable write-back cache
  • disable_cache() - Disable and flush cache
  • flush_cache() - Commit cached writes, returns count

Maintenance

  • optimize() - VACUUM database
  • close() - Flush and close

Dict Interface: vault[key], del vault[key], key in vault, len(vault), vault.keys(), vault.values(), vault.items(), etc.

Exceptions: KohakuVaultError, NotFound, DatabaseBusy, InvalidArgument, IoError

Architecture

Python wrapper (src/kohakuvault/proxy.py)
    ↓ PyO3 bindings
Rust core (src/kvault-rust/lib.rs)
    ↓ rusqlite
SQLite database (bundled)

Why hybrid? Rust handles SQLite operations safely and efficiently. Python provides the ergonomic dict-like interface.

Contributing

# Setup
git checkout -b feature-name
# Make changes
black src/kohakuvault && cargo fmt  # Format
pytest                               # Test
git commit && git push
# Open PR

Releasing

GitHub Actions automatically builds wheels and publishes to PyPI when you push a tag:

# 1. Update version in pyproject.toml and Cargo.toml
# 2. Commit changes
git add pyproject.toml Cargo.toml
git commit -m "Bump version to 0.1.0"

# 3. Create and push tag
git tag v0.1.0
git push origin main --tags

# 4. GitHub Actions will:
#    - Build wheels for all platforms
#    - Create GitHub Release with wheels attached
#    - Publish to PyPI (with skip-existing for safety)

What happens:

  • Wheels are built for Linux, Windows, macOS (Apple Silicon)
  • All wheels are uploaded to the GitHub Release (downloadable)
  • Wheels are published to PyPI
  • If some wheels already exist on PyPI, they're skipped (no error)

License

Apache 2.0 - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kohakuvault-0.3.0.tar.gz (113.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kohakuvault-0.3.0-cp313-cp313-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.13Windows x86-64

kohakuvault-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

kohakuvault-0.3.0-cp313-cp313-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

kohakuvault-0.3.0-cp312-cp312-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.12Windows x86-64

kohakuvault-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

kohakuvault-0.3.0-cp312-cp312-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kohakuvault-0.3.0-cp311-cp311-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.11Windows x86-64

kohakuvault-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

kohakuvault-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kohakuvault-0.3.0-cp310-cp310-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.10Windows x86-64

kohakuvault-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

kohakuvault-0.3.0-cp310-cp310-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file kohakuvault-0.3.0.tar.gz.

File metadata

  • Download URL: kohakuvault-0.3.0.tar.gz
  • Upload date:
  • Size: 113.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kohakuvault-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c0f329ecdb070ee360586eb984c31e8fca49fa64cda077358b23b72021a32fb7
MD5 c3c698c66418cbe38189ac5a53118235
BLAKE2b-256 c873090b600c031c641254eb7335daa3fea239be9bca9796f4c5e192af7e1571

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0.tar.gz:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 dc171d064f513cd524d54219e3b28229ede4ffb379350cf8494423cf1e3fc667
MD5 4a4412fa56cdc569b8e4d792b2bda8ed
BLAKE2b-256 dcde2cced06d105cba16b175d5cfe99d55a8cff504aeb4abc3e9d96857ab12db

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 44b6d59e3e0c79ccec5d350c562f435b81071c87d802aad1dc6a823be3940594
MD5 71a8d6fbb7707459cbf7bc3beba0f182
BLAKE2b-256 7b116834e1cf8b7a72497b4fa27bf6b71b8256903e658c5bacfc7a571543fb04

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b8e7066b1049b4f9bcbba389253f3329a5bd2873707bfbf9e363b52fb0d74894
MD5 de2838ae89de8a02c64b7476a48dea30
BLAKE2b-256 d0f120731c8ede6fc18af8b62bc780666c8ad9a625ce1e864ae69ff652dbbffe

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 17bc36da63cb6676301953a844039b775898f1d4dbebe410e4a59740dfbe9388
MD5 c0119d70a67b54c57d27e7dd771ccb4e
BLAKE2b-256 620da33a446333c8384e80486614e4b046f2b714a7ff0aaf90118c77196157e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1ee8f0c6bbdc0fc259e73f83fa68cb17c2d1cc2b7e8e265023494ed28e428f16
MD5 0d1298fb1b6cd24bcbfcbb70b6e47f93
BLAKE2b-256 7058452368b5c3132a43f83ed22e7ccc658e41088c26d3cfb615c7703fa2c9a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c80a375d59cb11e02d625920bd7562b7c6635cad129405d1cc2fbab8f6d8c102
MD5 bc26d7c659bd6054e4c96a38c6b71345
BLAKE2b-256 d637b5e64913931a2c5c010c37a5b95d8061d1018cccc47f2f6bc3bdf979d1d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 df0d0cc0f3c62a258073e335142cfb9ea8355d96aecc4b832f167fd1e345eafc
MD5 e68fca7ba28b68d12b119a7b8da7126b
BLAKE2b-256 31f1711253de578d7a43e88b68e8d08d4c99e3011c2b1bacae560dfcbecca8cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp311-cp311-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a673298fdb04a007038e5d0dbff8ad074beeac57b7ed9f506741ebb61d91fd53
MD5 ebcde7b2aa2e632aaac20836ed9b91d6
BLAKE2b-256 7d0d4e56b60eabc96cd05a4c7f3c8b216d3ada440baa94c3610cbe0bc9b9f3e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 814ac3db87dbce28c1e6c84235bbaba6ddd4c4ae1ec0bcd278b4a62ef0a86ad7
MD5 377346bd10a261f0666337d5a1b78671
BLAKE2b-256 1fe27e4f4c61973c2b529a38b2d8b851c6b584f00cfca160760065ac7c789470

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d67d51e466589289c984ccf78b12eb7651980906d4c384474b05671452a4aa1d
MD5 4afa06fd8b07f2978b04160b561a5681
BLAKE2b-256 9872b31d5e827f8dc6f93e6bae93ad22a203d4759800eac8784f8c027310f551

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp310-cp310-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 43c304d3b21dedeb3ca7f13215764c743728c5267c3280fc9395d2d37c1d3697
MD5 5bee411865ca83e4fe6fc5bf9c500bca
BLAKE2b-256 0972ca36994155079d0fd1b3b141d11ee11a8a6fb249ac14633433f336ba2c55

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.3.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b850298ca1df01ca1c2249052f8cf8f493f1471457de3973a6179371286bd62c
MD5 581870a164172dcfec94955e5c7d6458
BLAKE2b-256 dc792b8a890a19ae67a861ca7fe569c6746cca29da32d26ff2a01f50bd57d8fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.3.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page