Skip to main content

SQLite-backed KV store (Rust+PyO3) for large media blobs with structured data support

Project description

KohakuVault

High-performance, SQLite-backed storage with dual interfaces: dict-like for blobs (key-value) and list-like for sequences (columnar). Rust core with Pythonic APIs.

Quick Start

pip install kohakuvault

KV Store - Dict-like interface for binary blobs (images, videos, documents):

from kohakuvault import KVault

vault = KVault("data.db")
vault["image:123"] = image_bytes
vault["video:456"] = video_bytes
data = vault["image:123"]

# Bulk writes with smart caching (NEW in v0.2.2!)
with vault.cache(64*1024*1024):
    for i in range(10000):
        vault[f"key:{i}"] = data
# Auto-flushes on exit!

Columnar Storage - List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Primitives
cv.create_column("temperatures", "f64")
temps = cv["temperatures"]
temps.extend([23.5, 24.1, 25.0])
print(temps[0])  # 23.5

# Structured data (NEW in v0.3.0!)
cv.create_column("users", "msgpack")
users = cv["users"]
users.append({"name": "Alice", "age": 30, "tags": ["vip"]})
print(users[0])  # {'name': 'Alice', 'age': 30, 'tags': ['vip']}

# Strings with encoding (NEW in v0.3.0!)
cv.create_column("messages", "str:utf8")
messages = cv["messages"]
messages.append("Hello, 世界!")
print(messages[0])  # 'Hello, 世界!'

DataPacker - Rust-based serialization (NEW in v0.3.0!):

from kohakuvault import DataPacker

# MessagePack for structured data
packer = DataPacker("msgpack")
packed = packer.pack({"user": "alice", "score": 95.5})
data = packer.unpack(packed, 0)

# Bulk operations
records = [{"id": i, "val": i*1.5} for i in range(1000)]
packed_all = packer.pack_many(records)  # Concatenated bytes

# Unpack with offsets (for variable-size)
offsets = [0, len(packer.pack(records[0]))]  # Calculate offsets
unpacked = packer.unpack_many(packed_all, offsets=offsets)

Features

  • Dual interfaces: Dict for blobs (KVault), List for sequences (ColumnVault)
  • Zero external dependencies: Single SQLite file, no services required
  • Memory efficient: Stream multi-GB files, dynamic chunk growth
  • Type-safe columnar: Fixed-size (i64, f64, bytes:N) and variable-size (bytes, str, msgpack, cbor)
  • Rust performance: Native speed with Pythonic ergonomics
  • Smart caching: Auto-flush context manager, daemon thread, capacity enforcement
  • Structured data: Store dicts/lists directly with MessagePack/CBOR (NEW in v0.3.0!)
  • DataPacker: Rust-based serialization with multi-encoding support (NEW in v0.3.0!)

Best Practices

Handling Many Large Binary Files

For thousands of large binaries (images, videos, documents), use a hybrid approach:

from kohakuvault import KVault, ColumnVault

kv = KVault("media.db")
cv = ColumnVault(kv)  # Share same database

# Store metadata in columnar (efficient for large lists)
cv.create_column("image_ids", "i64")
cv.create_column("image_names", "bytes")
cv.create_column("image_sizes", "i64")
cv.create_column("upload_times", "i64")

ids = cv["image_ids"]
names = cv["image_names"]
sizes = cv["image_sizes"]
times = cv["upload_times"]

# Store actual binaries in KV store
for img_id, img_data, img_name in image_stream:
    # Metadata in columnar (fast append, efficient iteration/filtering)
    ids.append(img_id)
    names.append(img_name)
    sizes.append(len(img_data))
    times.append(int(time.time()))

    # Binary data in KV (optimized for large blobs)
    kv[f"blob:{img_id}"] = img_data

# Query metadata without loading binaries
for i in range(len(ids)):
    if sizes[i] > 1024 * 1024:  # Find images > 1MB
        print(f"Large image: {names[i].decode()}")
        # Load binary only when needed
        data = kv[f"blob:{ids[i]}"]

Why this pattern?

  • ✅ Columnar optimized for append-heavy metadata (millions of entries)
  • ✅ KV optimized for large binary blobs (streaming, caching)
  • ✅ Can query/filter metadata without loading binaries
  • ✅ Both share same SQLite file (single-file deployment)
  • ✅ Efficient iteration over metadata, lazy loading of binaries

Installation

pip install kohakuvault  # When published to PyPI
pip install .            # From source

Platform Support:

  • ✅ Linux (x86_64)
  • ✅ Windows (x86_64)
  • ✅ macOS (Apple Silicon M1/M2/M3/M4 only - ARM64)
  • ❌ macOS Intel (x86_64) - not supported

Development

Prerequisites: Python 3.10+, Rust (rustup.rs)

# Setup
git clone https://github.com/yourusername/kohakuvault.git
cd kohakuvault
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e .[dev]
maturin develop  # Build Rust extension (once)

# Workflow
# - Edit Python files → changes live immediately
# - Edit Rust files → run `maturin develop` to rebuild

# Tools
pytest                  # Run tests
black src/kohakuvault   # Format Python
cargo fmt               # Format Rust
maturin build --release # Build production wheel

Usage

Basic Operations

vault = KVault("media.db")

# Dict-like interface
vault["key"] = b"value"
data = vault["key"]
del vault["key"]
if "key" in vault: ...

# Safe retrieval
data = vault.get("key", default=b"")

# Iteration
for key in vault:
    print(f"{key}: {len(vault[key])} bytes")

Streaming Large Files

vault = KVault("media.db", chunk_size=1024*1024)  # 1 MiB chunks

# Stream from file → vault
with open("large_video.mp4", "rb") as f:
    vault.put_file("video:789", f)

# Stream from vault → file
with open("output.mp4", "wb") as f:
    vault.get_to_file("video:789", f)

Bulk Operations with Caching

Recommended: Use context manager for automatic flush

vault = KVault("media.db")

# Safest: Context manager auto-flushes
with vault.cache(cap_bytes=64*1024*1024):
    for i in range(1000):
        vault[f"item:{i}"] = data
# Auto-flushed here, guaranteed!

# Long-running: Daemon thread auto-flushes every 5 seconds
vault.enable_cache(cap_bytes=64*1024*1024, flush_interval=5.0)
while True:
    vault["sensor_data"] = read_sensor()
# Daemon flushes automatically

# Manual control (backward compatible)
vault.enable_cache(cap_bytes=64*1024*1024)
for i in range(1000):
    vault[f"item:{i}"] = data
vault.flush_cache()  # Manual flush
vault.disable_cache()  # Auto-flushes before disabling

Configuration

vault = KVault(
    path="media.db",
    chunk_size=2*1024*1024,   # Streaming chunk size
    retries=10,                # Retry attempts for busy DB
    enable_wal=True,           # Write-Ahead Logging
    cache_kb=20000,            # SQLite cache size
)

Columnar Storage (NEW!)

List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Fixed-size types: i64, f64, bytes:N
cv.create_column("sensor_temps", "f64")
cv.create_column("timestamps", "i64")
cv.create_column("hashes", "bytes:32")  # 32-byte fixed

temps = cv["sensor_temps"]
temps.append(23.5)
temps.extend([24.1, 25.0, 25.3])
print(temps[0], temps[-1], len(temps))  # 23.5, 25.3, 4

# Variable-size bytes (for strings, JSON, etc.)
cv.create_column("log_messages", "bytes")  # No size = variable!
logs = cv["log_messages"]
logs.append(b"Short message")
logs.append(b"This is a much longer log entry with details...")
print(logs[0])  # Exact bytes, no padding

# Iterate
for temp in temps:
    print(temp)

Why columnar?

  • Append-heavy workloads (O(1) amortized, like Python list)
  • Typed data (int/float/bytes with type safety)
  • Efficient iteration and random access
  • Dynamic chunk growth (128KB → 16MB, exponential like std::vector)
  • Cross-chunk element support (byte-based addressing)
  • Minimal memory overhead (incremental BLOB I/O)

See docs/COLUMNAR_GUIDE.md and examples/columnar_demo.py for complete guide.

API Reference

Constructor

KVault(path, chunk_size=1048576, retries=4, backoff_base=0.02,
       table="kvault", enable_wal=True, page_size=4096,
       mmap_size=268435456, cache_kb=20000)

Methods

Storage

  • put(key, value) - Store bytes
  • put_file(key, reader, size=None, chunk_size=None) - Stream from file-like
  • get(key, default=None) - Retrieve bytes
  • get_to_file(key, writer, chunk_size=None) - Stream to file-like
  • delete(key) - Remove key
  • exists(key) - Check existence

Caching

  • enable_cache(cap_bytes, flush_threshold) - Enable write-back cache
  • disable_cache() - Disable and flush cache
  • flush_cache() - Commit cached writes, returns count

Maintenance

  • optimize() - VACUUM database
  • close() - Flush and close

Dict Interface: vault[key], del vault[key], key in vault, len(vault), vault.keys(), vault.values(), vault.items(), etc.

Exceptions: KohakuVaultError, NotFound, DatabaseBusy, InvalidArgument, IoError

Architecture

Python wrapper (src/kohakuvault/proxy.py)
    ↓ PyO3 bindings
Rust core (src/kvault-rust/lib.rs)
    ↓ rusqlite
SQLite database (bundled)

Why hybrid? Rust handles SQLite operations safely and efficiently. Python provides the ergonomic dict-like interface.

Contributing

# Setup
git checkout -b feature-name
# Make changes
black src/kohakuvault && cargo fmt  # Format
cargo clippy                        # Basic linting
pytest                              # Test
git commit && git push
# Open PR

License

Apache 2.0 - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kohakuvault-0.4.0.tar.gz (120.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kohakuvault-0.4.0-cp313-cp313-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.13Windows x86-64

kohakuvault-0.4.0-cp313-cp313-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.0-cp313-cp313-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

kohakuvault-0.4.0-cp312-cp312-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.12Windows x86-64

kohakuvault-0.4.0-cp312-cp312-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.0-cp312-cp312-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kohakuvault-0.4.0-cp311-cp311-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.11Windows x86-64

kohakuvault-0.4.0-cp311-cp311-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kohakuvault-0.4.0-cp310-cp310-win_amd64.whl (2.9 MB view details)

Uploaded CPython 3.10Windows x86-64

kohakuvault-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file kohakuvault-0.4.0.tar.gz.

File metadata

  • Download URL: kohakuvault-0.4.0.tar.gz
  • Upload date:
  • Size: 120.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kohakuvault-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d26a55f046dc51340a7cc5fcc12489ac86b2fb757fc305f0985cdc27c5b7a35d
MD5 03967007694fc07e992edce15572e077
BLAKE2b-256 944f7e25eb535aa8af42680bf3c93751c6cba2f3acd04c8592b068fb44532ca2

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0.tar.gz:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a21da69e5ada914abab54f451b3ec1117837b7d2779f796204a9ba24c843a108
MD5 2583a630943ae0265a5f92a1ff78e992
BLAKE2b-256 f6a789ca8e33a83c038d5c80d38f7f5fcf06ae85ed05e38c5dde71ce34e8fdb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 07ca91e79f10aedefad4753ba087872e79b9c404cbf5edc46c5470975d031edc
MD5 1f082a421486d87e6d110621b5729032
BLAKE2b-256 800a52a239deb379d2572bbf8596466f841490196b4589c36b58a74aa42725ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9271f3e1b57e61e780f5f280a8c7c852fb8200a01e50c6a4267ed2a88abb32e4
MD5 b66ec0ab3c5a2b139346bb97ef076db1
BLAKE2b-256 ec99b2147290de325a23d820e076c16a7b2656f4f1060b1334f76f752e57041a

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c696a8857f487ae23683f9959b87705f9d395f75ae5ac9917d93505bfc1afbd0
MD5 74cfa2b47269fe71848039284ee921a7
BLAKE2b-256 a67680fc13bb9b603e045b4f5bc68cb0e1bc31264637933823fcfc0b52c3d502

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 30b047f20876a554f785afaabcf58b5f70ae4374ea5e596b1ccecdf3ab8f9780
MD5 c8659ca9edf3ec219867e85b853c4ec7
BLAKE2b-256 e8b50bbaa311db405063165b12dc5bc9b06f4d31538e447789f86404397f50ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0523196d038507d02576aacf72159a81387e88cc5b39aa860f444e474fd3a0e7
MD5 3e6a290e3fbd648bf1977f132df7df3b
BLAKE2b-256 ec648ea9e05730220c6500ac108311df455ba186117da19d4c7f576c226711f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5b3c3039b7bba901a76876dcfe732e5a119b41bedd0c8356b084f0b06ffaa6ae
MD5 53d3ce592c6acd9eb86be61c3ae6a581
BLAKE2b-256 12770ec704939526d40c1da41acc962ce7f76d81c506b853ef580fd9b89b1376

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp311-cp311-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 5dcce6b96112ea848cfb5fffbb346093e853128ebb6e40f5a9cefba80079ff98
MD5 25deda0207199633427146085d67fc97
BLAKE2b-256 1b39d72fdcc73561ca1035464e24c0911154421513426604b11781e0e0db522d

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 14de0ed6b220219cea3aab15c1fce08dd9db487bb797c5bd57af171b8dce855b
MD5 922e19d3a9005646b462fca83e888211
BLAKE2b-256 734a1009d40329dcebb585b5dae5934a1dac83e5b009a8cb654fa5cb799b341f

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 dd52fca135d50ef196adb8ebb78536d404fa2aafdc18674d4379a2cb4481f39f
MD5 725bec7ed61a61c3d47a5848881960e5
BLAKE2b-256 47ffc504a8e165e5cd05055c5a67beb64d85d6e675ea62e1ef2c8f01fee2d923

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp310-cp310-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1d2b32ebe1bce3ee5ef16494cbfef129055a82adb243d63bccf3fa3163bb2812
MD5 acf54bd1624b0606f4b56139dc17e709
BLAKE2b-256 64865d47795ec58ee1892d1faa8a86fc8b1f18bc2b953f32bd9dd9dc30721786

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 517a9db5d1b140ea29bb22a4b9013d0f89f30967b0f3710126d9d7c703468252
MD5 9a00c19aaf24b57f7b77fe48117ea571
BLAKE2b-256 bbd93a4611959dfd0cb9bd18cdf1221585b5b379e18f93f43f300d18e052a648

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page