Skip to main content

"Disk-based redis" - Built around LMDB and Rust, with sharding for maximum throughput

Project description

⚡ lightning-disk-kv

This project is an absurdly fast, sharded Key-Value storage engine designed for high-throughput Python applications.

It is a drop-in solution for machine learning pipelines that need to store millions of embeddings (or other data type) samples efficiently. It solves the Global Interpreter Lock (GIL) bottleneck by offloading hashing, serialization, and disk I/O to parallel Rust threads.

🚀 Key Features

  • True Parallelism: Writes to multiple LMDB shards simultaneously using all CPU cores.
  • Zero-Copy Vectors: Specialized "Fast Path" for numpy arrays that writes raw bytes to disk (no pickling).
  • Generic Storage: Capable of storing arbitrary Python objects (Strings, Dicts, Lists) via optimized parallel pickling.
  • Crash Safe: Based on LMDB (Lightning Memory-Mapped Database), offering proven reliability.

📦 Installation

Option A: Install via Pip (Recommended)

pip install lightning_disk_kv

Option B: Build from Source

If you are modifying the Rust code or building for a specific architecture:

# Requires Rust and Maturin
maturin develop --release

⚡ Usage Guide

1. Initialization

Initialize the database by specifying a base directory. The storage engine automatically handles sharding (splitting data across multiple files) to maximize write speed.

from lightning_disk_kv import LDKV

# Initialize with 5 shards.
# 'map_size' is the maximum virtual memory size. 
# It does NOT consume this amount of RAM immediately.
# Default is ~1TB, which is safe for 64-bit systems.
db = LDKV(
    base_path="./my_database", 
    num_shards=5, 
    map_size=100 * 1024**3  # 100 GB limit
)

2. Storing Vectors (The "Fast Path")

Use store_vectors when dealing with Numpy embeddings. This bypasses Python's overhead entirely by reading memory directly from C-pointers.

Requirement: Data must be np.float32.

import numpy as np

# Create dummy data
ids = [1, 2, 3]
vectors = np.random.rand(3, 128).astype(np.float32)

# Store in parallel
db.store_vectors(vectors, ids)

# Retrieve
# Returns a list of numpy arrays, or None if the ID doesn't exist
results = db.get_vectors([1, 999])

print(results[0].shape)  # (128,)
print(results[1])        # None

3. Storing Objects (The "Generic Path")

Use store_data for strings, dictionaries, images, or lists. While this uses pickle internally, the serialization and disk writing happen in parallel threads, making it significantly faster than standard loops.

ids = [100, 101]
data = [
    "A simple string", 
    {"key": "value", "meta": [1, 2, 3]}
]

db.store_data(data, ids)

results = db.get_data([100])
print(results[0]) # "A simple string"

4. Management & Syncing

# Check total number of items across all shards
count = db.get_data_count()
print(f"Total items: {count}")

# Delete items
db.delete_data([1, 100])

# Force flush to disk
# The engine uses OS buffers for maximum speed. 
# Call .sync() to ensure data is physically written to the drive.
db.sync()

⚠️ Configuration & Safety

Understanding map_size

LMDB uses a memory map. You must set map_size larger than the maximum data you ever intend to store.

  • Don't worry about RAM: Setting this to 1TB does not use 1TB of RAM. It simply reserves virtual address space.
  • Error handling: If you exceed this limit, you will get a MapFull error.

Durability vs. Speed

To achieve maximum throughput, lightning_disk_kv sets the MDB_NOSYNC flag by default.

  • Application Crash: Data is safe.
  • OS Crash / Power Cut: Data currently in the OS buffer (last few seconds) might be lost.
  • Best Practice: If data durability is critical (e.g., you can't re-generate the data), call db.sync() periodically or after a large bulk insert.

🛠 Building from Source (Advanced)

If you cannot install via pip, you must compile the Rust backend manually.

  1. Install Rust:
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Install the builder:
    pip install maturin
    
  3. Compile: Navigate to the project root and run:
    maturin develop --release
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning_disk_kv-0.1.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lightning_disk_kv-0.1.0-cp310-cp310-win_amd64.whl (285.9 kB view details)

Uploaded CPython 3.10Windows x86-64

lightning_disk_kv-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl (417.5 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

lightning_disk_kv-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (468.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

lightning_disk_kv-0.1.0-cp38-abi3-win_amd64.whl (286.2 kB view details)

Uploaded CPython 3.8+Windows x86-64

lightning_disk_kv-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (469.2 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

lightning_disk_kv-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl (418.8 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file lightning_disk_kv-0.1.0.tar.gz.

File metadata

  • Download URL: lightning_disk_kv-0.1.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for lightning_disk_kv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb3694e04e1e417526e246b30f561ec8debbb037ffbd76dc9711bef6dced2eac
MD5 e56d657fd9c51e27868d79127f9d428f
BLAKE2b-256 8782a664a88267805ebeaad0206a121ad6a394267ca2a8a779cbb661bba850bb

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6a4c9e4aff13175b6c92f48ee1bfa4772d8793a527aecdc3ac45c5f888bcb1fb
MD5 f5186cf3c98528c4748d6a223ed9cdf5
BLAKE2b-256 66c770ad81020418adec0971a9c596a07b51084c846d4288d1a1410880513ffb

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 98c6db2f73a1fc7dfb0d4d383612d3de2f1d622db721da013055637cb565502a
MD5 52ca442834927ff03ab0061b5737cea8
BLAKE2b-256 59fb9657275af4ed271000eb62a0bc1d3d0069751cfe352a8fe7119a188f0c95

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 69714023566d92a05668c80e07ecaa08dab8afe0d191b3cdb7b11cd1db9dc797
MD5 a4562a78cfd804af758075b2dce2bc27
BLAKE2b-256 a6c353d7c8ad7155c019993e19abccfc24a1c4ea1fe527126f3f59f3664374fd

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2a832d3c29f5bc2de7963abdc0bfa3f7583d5d18c1b8a707859b09795650b428
MD5 746fba5662aaa0168f07018547d1a034
BLAKE2b-256 3adb3aeaf4821609ac1fe8cdc58824e8c1ca7945862bb795613f5b715eec7374

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5015039f0b945c9efc9b974def9e849854fee16202e3c8b0ece2c631b89bc835
MD5 6a0b721b66197a28cbd1e544439eafc5
BLAKE2b-256 b907e89ac71e562e388f95fd91a5c35f5ed4c4f0a731a3892a867a5606212a82

See more details on using hashes here.

File details

Details for the file lightning_disk_kv-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for lightning_disk_kv-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 525d740e0f67d2efd2f7c9dee6dcbe132ec34a7884693458cdf511a2583710ed
MD5 cb98696e3cfbf15e0a83b9882cb3b16f
BLAKE2b-256 a233769bf78201e917a2d9aaa347fefa46758e478b689f641f16d100a4aa87e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page