Skip to main content

Looks like a dict and acts like a dict but is persistent via an LMDB db

Project description

PersistDict

A persistent dictionary implementation backed by an LMDB database. PersistDict looks and acts like a Python dictionary but persists data to disk, making it ideal for caching and persistent storage needs.

Overview

PersistDict provides a dictionary-like interface that stores data on disk using the high-performance LMDB (Lightning Memory-Mapped Database). It builds upon lmdb-dict to provide a robust, thread-safe persistent dictionary with additional features like automatic expiration, metadata tracking, and customizable serialization.

Why PersistDict?

I created PersistDict while developing wdoc, my RAG library, after encountering issues with langchain's caching mechanisms. Instead of relying on existing implementations that didn't handle concurrency well, I built PersistDict to be thread-safe and robust from the ground up.

PersistDict makes it simple to add persistent caching to any Python application. While earlier versions (before 2.0.0) used SQLite, the current version leverages LMDB for better performance and reliability in concurrent environments.

Key Features

  • Thread-safe: All operations are protected by a reentrant lock, allowing multiple threads to safely access the same database without corruption.
  • Background Processing: Integrity checks and expiration run in a background thread by default, avoiding blocking the main thread during initialization.
  • Automatic Expiration: Old entries are automatically removed after a configurable number of days to prevent unbounded growth.
  • Metadata Tracking: Each entry includes creation time (ctime) and last access time (atime) for advanced data management.
  • Performance Optimized: Uses LRUCache128 from cachetools for better performance with frequently accessed items.
  • Customizable Serialization: Supports custom serializers for both keys and values, enabling encryption, compression, or any custom data transformation.
  • Key Hashing: Keys are hashed and cropped to handle the LMDB key size limitation (default 511 bytes).
  • Robust Error Handling: Gracefully handles serialization errors and database corruption with detailed logging.
  • Collision Management: Properly handles key hash collisions to ensure data integrity.
  • Minimal Dependencies: Only requires lmdb-dict-full. Optionally uses beartype for type checking and loguru for logging if available.

Installation

From PyPI

pip install PersistDict

From GitHub

git clone https://github.com/thiswillbeyourgithub/PersistDict
cd PersistDict
pip install -e .

Running Tests

cd PersistDict
python -m pytest tests/test_persistdict.py -v

Basic Usage

from PersistDict import PersistDict

# Create a persistent dictionary
d = PersistDict(
    database_path="/path/to/db",  # Path to the database directory
    expiration_days=30,           # Optional: entries older than this will be removed
    verbose=False,                # Optional: enable debug logging
    background_thread=True,       # Optional: run initialization tasks in background
)

# Use it like a regular dictionary
d["key"] = "value"
print(d["key"])         # "value"
print("key" in d)       # True
print(len(d))           # 1

# Dictionary-style initialization (only available once)
d = d(a=1, b="string", c=[1, 2, 3])

# Supports standard dictionary methods
for key in d.keys():
    print(key)
    
for value in d.values():
    print(value)
    
for key, value in d.items():
    print(f"{key}: {value}")

# Delete items
del d["a"]

# Clear the entire dictionary
d.clear()

Advanced Usage

Custom Serialization

import json
import pickle
import dill

# Custom serializers for encryption, compression, etc.
d = PersistDict(
    database_path="/path/to/db",
    key_serializer=json.dumps,       # Custom key serializer
    key_unserializer=json.loads,     # Custom key deserializer
    value_serializer=dill.dumps,     # Custom value serializer
    value_unserializer=dill.loads,   # Custom value deserializer
    key_size_limit=511,              # Maximum key size before hashing
    caching=True,                    # Enable/disable LRU caching
    background_timeout=30,           # Maximum time for background operations
)

Shared Database Access

Multiple instances can safely access the same database:

# Create two instances pointing to the same database
d1 = PersistDict(database_path="/path/to/db")
d2 = PersistDict(database_path="/path/to/db")

# Changes in one instance are visible in the other
d1["shared_key"] = "shared_value"
assert d2["shared_key"] == "shared_value"
assert list(d1.keys()) == list(d2.keys())

Background Thread Control

Control how initialization tasks run:

# Run in background (default)
d1 = PersistDict(database_path="/path/to/db", background_thread=True)

# Run in foreground (blocking)
d2 = PersistDict(database_path="/path/to/db", background_thread=False)

# Skip initialization tasks entirely
d3 = PersistDict(database_path="/path/to/db", background_thread="disabled")

Named Instances

Create named instances for better logging:

d = PersistDict(
    database_path="/path/to/db",
    name="cache_db",    # Name for identifying this instance in logs
    verbose=True        # Enable logging
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persistdict-0.2.12.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persistdict-0.2.12-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file persistdict-0.2.12.tar.gz.

File metadata

  • Download URL: persistdict-0.2.12.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for persistdict-0.2.12.tar.gz
Algorithm Hash digest
SHA256 558d19c7973255c45ff428b3421b9f77fdd9d23742761ae9a726e27823d51ac8
MD5 5f5e5d9d27b9348d3f697b2f01140567
BLAKE2b-256 a1ab5eccb4c2d8bc2ea3e2fe75d0cd6ccd7ac9c3fae914aa5dec417b7c93f78e

See more details on using hashes here.

File details

Details for the file persistdict-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: persistdict-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for persistdict-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 fd9d4f7861fa9d30cea4b248da76b80c0564469b9b3ae0d3e6fa030be1c30f96
MD5 1bebf1ae31cca243eaf41b4a832a99cd
BLAKE2b-256 3c3d5281bfdd4fd47360c269eaa82283b963e511560b513f9de6b85b105513d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page