Skip to main content

"Disk-based redis" - Built around LMDB and Rust, with sharding for maximum throughput

Project description

⚡ lightning-disk-kv

This project is an absurdly fast, sharded Key-Value storage engine designed for high-throughput Python applications.

It is a drop-in solution for machine learning pipelines that need to store millions of embeddings (or other data type) samples efficiently. It solves the Global Interpreter Lock (GIL) bottleneck by offloading hashing, serialization, and disk I/O to parallel Rust threads.

🚀 Key Features

  • True Parallelism: Writes to multiple LMDB shards simultaneously using all CPU cores.
  • Zero-Copy Vectors: Specialized "Fast Path" for numpy arrays that writes raw bytes to disk (no pickling).
  • Generic Storage: Capable of storing arbitrary Python objects (Strings, Dicts, Lists) via optimized parallel pickling.
  • Crash Safe: Based on LMDB (Lightning Memory-Mapped Database), offering proven reliability.
  • Redis Compatible: Includes a wrapper that mimics the redis-py API for easy integration.

📦 Installation

Option A: Install via Pip (Recommended)

pip install lightning_disk_kv

Option B: Build from Source

If you are modifying the Rust code or building for a specific architecture:

# Requires Rust and Maturin
maturin develop --release

⚡ Usage Guide

1. Initialization

Initialize the database by specifying a base directory. The storage engine automatically handles sharding (splitting data across multiple files) to maximize write speed.

from lightning_disk_kv import LDKV

# Initialize with 5 shards.
# 'map_size' is the maximum virtual memory size. 
# It does NOT consume this amount of RAM immediately.
# Default is ~1TB, which is safe for 64-bit systems.
db = LDKV(
    base_path="./my_database", 
    num_shards=5, 
    map_size=100 * 1024**3  # 100 GB limit
)

2. Storing Vectors (The "Fast Path")

Use store_vectors when dealing with Numpy embeddings. This bypasses Python's overhead entirely by reading memory directly from C-pointers.

Requirement: Data must be np.float32.

import numpy as np

# Create dummy data
ids = [1, 2, 3]
vectors = np.random.rand(3, 128).astype(np.float32)

# Store in parallel
db.store_vectors(vectors, ids)

# Retrieve
# Returns a list of numpy arrays, or None if the ID doesn't exist
results = db.get_vectors([1, 999])

print(results[0].shape)  # (128,)
print(results[1])        # None

3. Storing Objects (The "Generic Path")

Use store_data for strings, dictionaries, images, or lists. While this uses pickle internally, the serialization and disk writing happen in parallel threads, making it significantly faster than standard loops.

ids = [100, 101]
data = [
    "A simple string", 
    {"key": "value", "meta": [1, 2, 3]}
]

db.store_data(data, ids)

results = db.get_data([100])
print(results[0]) # "A simple string"

4. Redis Compatibility API

We provide a redis-py compatible wrapper. This allows you to use lightning-disk-kv as an embedded, persistent Redis replacement without running a separate server process.

from lightning_redis import LDKV_RedisCompat

# Initialize (replaces host/port with a file path)
r = LDKV_RedisCompat(base_path="./redis_data", decode_responses=True)

# Basic Key-Value
r.set('foo', 'bar')
print(r.get('foo'))  # 'bar'

# TTL (Time To Live) - key automatically removed after 5 seconds
r.set('temp_key', 'hidden', ex=5)

# Atomic Counters
r.incr('visitor_count', amount=1)

# Hash Maps
r.hset('user:100', mapping={'name': 'Alice', 'role': 'admin'})
print(r.hgetall('user:100')) # {'name': 'Alice', 'role': 'admin'}

5. Management & Syncing

# Check total number of items across all shards
count = db.get_data_count()
print(f"Total items: {count}")

# Delete items
db.delete_data([1, 100])

# Force flush to disk
# The engine uses OS buffers for maximum speed. 
# Call .sync() to ensure data is physically written to the drive.
db.sync()

⚠️ Configuration & Safety

Understanding map_size

LMDB uses a memory map. You must set map_size larger than the maximum data you ever intend to store.

  • Don't worry about RAM: Setting this to 1TB does not use 1TB of RAM. It simply reserves virtual address space.
  • Error handling: If you exceed this limit, you will get a MapFull error.

Durability vs. Speed

To achieve maximum throughput, lightning_disk_kv sets the MDB_NOSYNC flag by default.

  • Application Crash: Data is safe.
  • OS Crash / Power Cut: Data currently in the OS buffer (last few seconds) might be lost.
  • Best Practice: If data durability is critical (e.g., you can't re-generate the data), call db.sync() periodically or after a large bulk insert.

🛠 Building from Source (Advanced)

If you cannot install via pip, you must compile the Rust backend manually.

  1. Install Rust:
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Install the builder:
    pip install maturin
    
  3. Compile: Navigate to the project root and run:
    maturin develop --release
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning_disk_kv-0.1.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lightning_disk_kv-0.1.1-cp38-abi3-win_amd64.whl (289.0 kB view details)

Uploaded CPython 3.8+Windows x86-64

lightning_disk_kv-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (472.0 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

lightning_disk_kv-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl (421.6 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file lightning_disk_kv-0.1.1.tar.gz.

File metadata

  • Download URL: lightning_disk_kv-0.1.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for lightning_disk_kv-0.1.1.tar.gz
Algorithm Hash digest
SHA256 02dcc651568a9b384bd0abd1862d53125e8f898f5c0b42aca607982fa2f23381
MD5 95e06e2fec48e8433069977d93f08fbb
BLAKE2b-256 70fd6b00f347d5a46cdf511dcc29353024e841ea39c99d2a5c54da63da7d29c8

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e7f40e13347133c58867fc7144bbc627c9b01df9fd9617916ae0b01837dbed62
MD5 4defd4782c05b7257c50e409e6359d8e
BLAKE2b-256 a62f0be9ca8bfd54790a6776867c0878baca5d2db49c897d61ce7686e435dd1c

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bd425a1c3ceee82d60e10add290725ede5a88aa69bba6f76516fb60f1475a9ff
MD5 9545e72ec11721a87b8e36a1654ddf88
BLAKE2b-256 4b6bd1c92e8caca0029b325368a25e261aa708fcdd4776a8aec4aba636822c9d

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 04011f3125ddd971380d6e68226ac8aa70e0f02df7c9533c2c4df7572711b6c7
MD5 a7f501c7a8be1a15e6cb3628525661fa
BLAKE2b-256 a971e28b6f4cd158b31e7cf131a9a1e07f8c5a81fdf6739aad20f32ad5d68649

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page