Skip to main content

Looks like a dict and acts like a dict but is persistent via an LMDB db

Project description

PersistDict

A persistent dictionary implementation backed by an LMDB database. PersistDict looks and acts like a Python dictionary but persists data to disk. It makes heavy use of lmdb-dict behind the scenes.

Why?

I ran into issues with langchain's caches when developing wdoc (my RAG library) and after months of waiting I decided to fix it myself. Instead of trusting sqldict's implementation with langchain's concurrency, I made my own.

This makes it very easy to add persistent caching to anything. I initially made an implementation that used SQLite (with support for encryption, compression and handled concurrency via a singleton), but then I discovered lmdb-dict which is likely much better as it's developed by professionals. It's based on LMDB which is more suitable for what I was after than SQLite3. If you want to use the SQLite version, check out versions before 2.0.0.

Features:

  • Thread-safe: All operations are protected by a reentrant lock. Multiple threads can safely access the same database without corruption.
  • Background processing: Integrity checks and expiration run in a background thread by default, avoiding blocking the main thread during initialization.
  • Automatic expiration: Old entries are automatically removed after a configurable number of days to prevent unbounded growth.
  • Metadata tracking: Each entry includes creation time (ctime) and last access time (atime).
  • Caching: Uses a LRUCache128 from cachetools for better performance.
  • Customizable serialization: Supports custom serializers for both keys and values, enabling encryption, compression, etc.
  • Key hashing: Keys are hashed and cropped to handle the LMDB key size limitation (default 511 bytes).
  • Robust error handling: Gracefully handles serialization errors and database corruption.
  • Minimal dependencies: Only requires lmdb-dict-full. Optionally uses beartype for type checking and loguru for logging if available.

Installation:

  • From PyPI:
    pip install PersistDict
    
  • From GitHub:
    git clone https://github.com/thiswillbeyourgithub/PersistDict
    cd PersistDict
    pip install -e .
    
  • Run tests:
    cd PersistDict
    python -m pytest tests/test_persistdict.py -v
    

Basic Usage:

from PersistDict import PersistDict

# Create a persistent dictionary
d = PersistDict(
    database_path="/path/to/db",  # Path to the database directory
    expiration_days=30,           # Optional: entries older than this will be removed
    verbose=False,                # Optional: enable debug logging
    background_thread=True,       # Optional: run initialization tasks in background
)

# Use it like a regular dictionary
d["key"] = "value"
print(d["key"])  # "value"
print("key" in d)  # True
print(len(d))  # 1

# Dictionary-style initialization (only available once)
d = d(a=1, b="string", c=[1, 2, 3])

# Supports standard dictionary methods
for key in d.keys():
    print(key)
    
for value in d.values():
    print(value)
    
for key, value in d.items():
    print(f"{key}: {value}")

# Delete items
del d["a"]

# Clear the entire dictionary
d.clear()

Advanced Usage:

import json
import pickle
import dill

# Custom serializers for encryption, compression, etc.
d = PersistDict(
    database_path="/path/to/db",
    key_serializer=json.dumps,       # Custom key serializer
    key_unserializer=json.loads,     # Custom key deserializer
    value_serializer=dill.dumps,     # Custom value serializer
    value_unserializer=dill.loads,   # Custom value deserializer
    key_size_limit=511,              # Maximum key size before hashing
    caching=True,                    # Enable/disable LRU caching
    background_timeout=30,           # Maximum time for background operations
)

# Multiple instances can safely access the same database
d2 = PersistDict(database_path="/path/to/db")
assert list(d.keys()) == list(d2.keys())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persistdict-0.2.6.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persistdict-0.2.6-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file persistdict-0.2.6.tar.gz.

File metadata

  • Download URL: persistdict-0.2.6.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for persistdict-0.2.6.tar.gz
Algorithm Hash digest
SHA256 87bd2f79e4da8d6f0321c037d7b3c6949e7d7d1e5824552c5535a84bfa69ef9a
MD5 e3a36492df62247dc1c609132e710610
BLAKE2b-256 9c9d3ecb87e05063010a0f7738ee81c51d16c1e57f3310e732c4b9dfced184c5

See more details on using hashes here.

File details

Details for the file persistdict-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: persistdict-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for persistdict-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6fd3337fea54456dca1adeb783acdf21170f92476efd2f7a192ec7453d6203e3
MD5 ffe188268e17feb5de62666384a53b24
BLAKE2b-256 5d77ffbe70be7c9d26b1d20a9130604444f383ded3ebd3bb43a254c37f1a5199

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page