Skip to main content

SQLite-backed KV store (Rust+PyO3) for large media blobs with structured data support

Project description

KohakuVault

High-performance, SQLite-backed storage with dual interfaces: dict-like for blobs (key-value) and list-like for sequences (columnar). Rust core with Pythonic APIs.

Quick Start

pip install kohakuvault

KV Store - Dict-like interface for binary blobs (images, videos, documents):

from kohakuvault import KVault

vault = KVault("data.db")
vault["image:123"] = image_bytes
vault["video:456"] = video_bytes
data = vault["image:123"]

# Bulk writes with smart caching (NEW in v0.2.2!)
with vault.cache(64*1024*1024):
    for i in range(10000):
        vault[f"key:{i}"] = data
# Auto-flushes on exit!

Columnar Storage - List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Primitives
cv.create_column("temperatures", "f64")
temps = cv["temperatures"]
temps.extend([23.5, 24.1, 25.0])
print(temps[0])  # 23.5

# High-performance bulk writes with cache (NEW in v0.4.1!)
with temps.cache():
    for temp in sensor_readings:
        temps.append(temp)  # 10-100x faster!

# Structured data (NEW in v0.3.0!)
cv.create_column("users", "msgpack")
users = cv["users"]
users.append({"name": "Alice", "age": 30, "tags": ["vip"]})
print(users[0])  # {'name': 'Alice', 'age': 30, 'tags': ['vip']}

# Strings with encoding (NEW in v0.3.0!)
cv.create_column("messages", "str:utf8")
messages = cv["messages"]
messages.append("Hello, 世界!")
print(messages[0])  # 'Hello, 世界!'

DataPacker - Rust-based serialization (NEW in v0.3.0!):

from kohakuvault import DataPacker

# MessagePack for structured data
packer = DataPacker("msgpack")
packed = packer.pack({"user": "alice", "score": 95.5})
data = packer.unpack(packed, 0)

# Bulk operations
records = [{"id": i, "val": i*1.5} for i in range(1000)]
packed_all = packer.pack_many(records)  # Concatenated bytes

# Unpack with offsets (for variable-size)
offsets = [0, len(packer.pack(records[0]))]  # Calculate offsets
unpacked = packer.unpack_many(packed_all, offsets=offsets)

Features

  • Dual interfaces: Dict for blobs (KVault), List for sequences (ColumnVault)
  • Zero external dependencies: Single SQLite file, no services required
  • Memory efficient: Stream multi-GB files, dynamic chunk growth
  • Type-safe columnar: Fixed-size (i64, f64, bytes:N) and variable-size (bytes, str, msgpack, cbor)
  • Rust performance: Native speed with Pythonic ergonomics
  • Smart caching: Write-back cache for 10-100x faster bulk writes (NEW in v0.4.1!)
  • Structured data: Store dicts/lists directly with MessagePack/CBOR (v0.3.0)
  • DataPacker: Rust-based serialization with multi-encoding support (v0.3.0)

Best Practices

Handling Many Large Binary Files

For thousands of large binaries (images, videos, documents), use a hybrid approach:

from kohakuvault import KVault, ColumnVault

kv = KVault("media.db")
cv = ColumnVault(kv)  # Share same database

# Store metadata in columnar (efficient for large lists)
cv.create_column("image_ids", "i64")
cv.create_column("image_names", "bytes")
cv.create_column("image_sizes", "i64")
cv.create_column("upload_times", "i64")

ids = cv["image_ids"]
names = cv["image_names"]
sizes = cv["image_sizes"]
times = cv["upload_times"]

# Store actual binaries in KV store
for img_id, img_data, img_name in image_stream:
    # Metadata in columnar (fast append, efficient iteration/filtering)
    ids.append(img_id)
    names.append(img_name)
    sizes.append(len(img_data))
    times.append(int(time.time()))

    # Binary data in KV (optimized for large blobs)
    kv[f"blob:{img_id}"] = img_data

# Query metadata without loading binaries
for i in range(len(ids)):
    if sizes[i] > 1024 * 1024:  # Find images > 1MB
        print(f"Large image: {names[i].decode()}")
        # Load binary only when needed
        data = kv[f"blob:{ids[i]}"]

Why this pattern?

  • ✅ Columnar optimized for append-heavy metadata (millions of entries)
  • ✅ KV optimized for large binary blobs (streaming, caching)
  • ✅ Can query/filter metadata without loading binaries
  • ✅ Both share same SQLite file (single-file deployment)
  • ✅ Efficient iteration over metadata, lazy loading of binaries

Installation

pip install kohakuvault  # When published to PyPI
pip install .            # From source

Platform Support:

  • ✅ Linux (x86_64)
  • ✅ Windows (x86_64)
  • ✅ macOS (Apple Silicon M1/M2/M3/M4 only - ARM64)
  • ❌ macOS Intel (x86_64) - not supported

Development

Prerequisites: Python 3.10+, Rust (rustup.rs)

# Setup
git clone https://github.com/yourusername/kohakuvault.git
cd kohakuvault
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e .[dev]
maturin develop  # Build Rust extension (once)

# Workflow
# - Edit Python files → changes live immediately
# - Edit Rust files → run `maturin develop` to rebuild

# Tools
pytest                  # Run tests
black src/kohakuvault   # Format Python
cargo fmt               # Format Rust
maturin build --release # Build production wheel

Usage

Basic Operations

vault = KVault("media.db")

# Dict-like interface
vault["key"] = b"value"
data = vault["key"]
del vault["key"]
if "key" in vault: ...

# Safe retrieval
data = vault.get("key", default=b"")

# Iteration
for key in vault:
    print(f"{key}: {len(vault[key])} bytes")

Streaming Large Files

vault = KVault("media.db", chunk_size=1024*1024)  # 1 MiB chunks

# Stream from file → vault
with open("large_video.mp4", "rb") as f:
    vault.put_file("video:789", f)

# Stream from vault → file
with open("output.mp4", "wb") as f:
    vault.get_to_file("video:789", f)

Bulk Operations with Caching

Recommended: Use context manager for automatic flush

vault = KVault("media.db")

# Safest: Context manager auto-flushes
with vault.cache(cap_bytes=64*1024*1024):
    for i in range(1000):
        vault[f"item:{i}"] = data
# Auto-flushed here, guaranteed!

# Long-running: Daemon thread auto-flushes every 5 seconds
vault.enable_cache(cap_bytes=64*1024*1024, flush_interval=5.0)
while True:
    vault["sensor_data"] = read_sensor()
# Daemon flushes automatically

# Manual control (backward compatible)
vault.enable_cache(cap_bytes=64*1024*1024)
for i in range(1000):
    vault[f"item:{i}"] = data
vault.flush_cache()  # Manual flush
vault.disable_cache()  # Auto-flushes before disabling

Configuration

vault = KVault(
    path="media.db",
    chunk_size=2*1024*1024,   # Streaming chunk size
    retries=10,                # Retry attempts for busy DB
    enable_wal=True,           # Write-Ahead Logging
    cache_kb=20000,            # SQLite cache size
)

Columnar Storage (NEW!)

List-like interface for typed sequences (timeseries, logs, events):

from kohakuvault import ColumnVault

cv = ColumnVault("data.db")

# Fixed-size types: i64, f64, bytes:N
cv.create_column("sensor_temps", "f64")
cv.create_column("timestamps", "i64")
cv.create_column("hashes", "bytes:32")  # 32-byte fixed

temps = cv["sensor_temps"]
temps.append(23.5)
temps.extend([24.1, 25.0, 25.3])
print(temps[0], temps[-1], len(temps))  # 23.5, 25.3, 4

# Variable-size bytes (for strings, JSON, etc.)
cv.create_column("log_messages", "bytes")  # No size = variable!
logs = cv["log_messages"]
logs.append(b"Short message")
logs.append(b"This is a much longer log entry with details...")
print(logs[0])  # Exact bytes, no padding

# Iterate
for temp in temps:
    print(temp)

Why columnar?

  • Append-heavy workloads (O(1) amortized, like Python list)
  • Typed data (int/float/bytes with type safety)
  • Efficient iteration and random access
  • Dynamic chunk growth (128KB → 16MB, exponential like std::vector)
  • Cross-chunk element support (byte-based addressing)
  • Minimal memory overhead (incremental BLOB I/O)

See docs/COLUMNAR_GUIDE.md and examples/columnar_demo.py for complete guide.

API Reference

Constructor

KVault(path, chunk_size=1048576, retries=4, backoff_base=0.02,
       table="kvault", enable_wal=True, page_size=4096,
       mmap_size=268435456, cache_kb=20000)

Methods

Storage

  • put(key, value) - Store bytes
  • put_file(key, reader, size=None, chunk_size=None) - Stream from file-like
  • get(key, default=None) - Retrieve bytes
  • get_to_file(key, writer, chunk_size=None) - Stream to file-like
  • delete(key) - Remove key
  • exists(key) - Check existence

Caching

  • enable_cache(cap_bytes, flush_threshold) - Enable write-back cache
  • disable_cache() - Disable and flush cache
  • flush_cache() - Commit cached writes, returns count

Maintenance

  • optimize() - VACUUM database
  • close() - Flush and close

Dict Interface: vault[key], del vault[key], key in vault, len(vault), vault.keys(), vault.values(), vault.items(), etc.

Exceptions: KohakuVaultError, NotFound, DatabaseBusy, InvalidArgument, IoError

Architecture

Python wrapper (src/kohakuvault/proxy.py)
    ↓ PyO3 bindings
Rust core (src/kvault-rust/lib.rs)
    ↓ rusqlite
SQLite database (bundled)

Why hybrid? Rust handles SQLite operations safely and efficiently. Python provides the ergonomic dict-like interface.

Contributing

# Setup
git checkout -b feature-name
# Make changes
black src/kohakuvault && cargo fmt  # Format
cargo clippy                        # Basic linting
pytest                              # Test
git commit && git push
# Open PR

License

Apache 2.0 - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kohakuvault-0.4.1.tar.gz (128.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kohakuvault-0.4.1-cp313-cp313-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.13Windows x86-64

kohakuvault-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.1-cp313-cp313-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

kohakuvault-0.4.1-cp312-cp312-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.12Windows x86-64

kohakuvault-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.1-cp312-cp312-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kohakuvault-0.4.1-cp311-cp311-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.11Windows x86-64

kohakuvault-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.1-cp311-cp311-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kohakuvault-0.4.1-cp310-cp310-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.10Windows x86-64

kohakuvault-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

kohakuvault-0.4.1-cp310-cp310-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file kohakuvault-0.4.1.tar.gz.

File metadata

  • Download URL: kohakuvault-0.4.1.tar.gz
  • Upload date:
  • Size: 128.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kohakuvault-0.4.1.tar.gz
Algorithm Hash digest
SHA256 7b9e214235146111b150c18e271025b833eea5579b97fb6145b7ed35a510a2b9
MD5 f5fcf94f066e3c9070d7d354d61831a2
BLAKE2b-256 9bc47ab3b9ba79392abd32d14e604ab467d5bb598a8a45894a0625812fd1d581

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1.tar.gz:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 26ff2c283e1bf2cd8f309b073256230279f5266b12afe4b66c608096ed2ad0ea
MD5 cc61077d94e080c6d6ef46647ba52614
BLAKE2b-256 c1d1b754e7e0972ffad7c58240e8aff64562a4738f4d150cd58908b2922d0d36

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp313-cp313-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 36dcb8ddc86aa6a28926fc17582f020c38a8e84afbcd187bc0f02c33f966050e
MD5 0be883d31119811116bef2bd6466c276
BLAKE2b-256 4ca6caaba3c74a6024e617d4095be598bc5b349abee4f1d98e7f26e0a1abe23c

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 293932b401921b64268dc76a2baba5a23693602146c1aa0c11c27249ed4fe81f
MD5 720a5fd8b9598839ecbef6487535a71d
BLAKE2b-256 e1b48ca6d7ede6989b72c22f25044d6df4b07087f498442f96a30f9ce8f4a98c

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e4e889e39d5ed2c3682dd636fa559c9ebcf854f801f4ee32e1f379040fa13f12
MD5 b41b3569e3392659d8a50b1c6df3dd77
BLAKE2b-256 a5f677acf7271468d3791348ff0b16cf1efb7591952bae779c6a15020a3356ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 78b6ea1d1474e217ab2bf42976996df70f9247909d362c9909d2fa220abf6fb3
MD5 3399c92ca83e9650f855d65e992bab1e
BLAKE2b-256 99af7493586dcecdfe9384404bca82cd0c61fe9057c52615416f67ae2132060a

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eb57cd137de3cdd424aefd3223d1188745a7a42cdddc68eb1f81a69487e23258
MD5 5c6faadbc635092ce826dfd2d1aa49bf
BLAKE2b-256 f89523333895cee0709bde96316077eda970d82c8f48e20a398973e79cb905e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cc7f0d4fdfd0b7ce7fc5072483b05c3a9192f855f0ff3650bbe62869bb78118a
MD5 f3049d245f786fdfe9365905b3b346de
BLAKE2b-256 eba3c54d18c17dd43d432cea20ff55c12b548c16e068b7397e8c5a07339cb17f

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp311-cp311-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 596db532e25f0a7212f1e02c96356f30b70b9920e7957f4121b39f3ae6499944
MD5 c7226b24ae2c00d3e6c7f6c0f415b7ef
BLAKE2b-256 8c38e0123854ce0acf75005135584968a33cdb38a5db9dc99de43c2127223425

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a5c04f28889474fbb783b491775aa5d4f3b2f17fe1e04737a0948c3f2823bb1f
MD5 c2ae9303580e43a92a6bfedbb7cbb323
BLAKE2b-256 016226ca79271d2d0a2e9a0219c3eb888954a4d292f432563b6ea818321b963d

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 229c7a076efa11914dd060d41c5ff26cef16e1685dd2048fb24facad37a0acfb
MD5 208239370acdd3f072bc2c419e926574
BLAKE2b-256 23474fbca1f715ce57b258de258a95e3b5084da170c3033becb7553532ac73a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp310-cp310-win_amd64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b71a294edd13ba4385845a1222c143246270e81263b5463d41aecfa9e6b9ac29
MD5 4138dd9e9dc34c523c3482d3b52c987f
BLAKE2b-256 3109d3ce6155c7b526831e7b3ee102910c7f50f47f51a8976e2b9094809e5dea

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kohakuvault-0.4.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kohakuvault-0.4.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c589c2c717a6ddaf02c7861025ef562c40844524e19b49f65e555c10039dc685
MD5 f6f8e4b4710a9c27530de3d88ead038a
BLAKE2b-256 d32a8375b4db89fdd115c241de92d0b74e93257b1cfca2e456325f7ee6810e46

See more details on using hashes here.

Provenance

The following attestation bundles were made for kohakuvault-0.4.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on KohakuBlueleaf/KohakuVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page