Simple persistent key-value store for Python. Values are stored as files on a disk or as S3 objects on AWS cloud.
Project description
persidict
Simple persistent dictionaries for distributed applications in Python.
What Is It?
persidict is a lightweight persistent key-value store for Python.
It saves a dictionary to either a local directory or an AWS S3 bucket,
storing each value as its own file or S3 object. Keys are limited to
URL/filename-safe strings or sequences of strings.
In contrast to traditional persistent dictionaries (e.g., Python's shelve),
persidict is designed
for distributed environments where multiple processes
on different machines concurrently work with the same store.
Why Use It?
A small API surface with scalable storage backends and explicit concurrency controls.
Features
- Persistent Storage: Save dictionaries to the local filesystem
(
FileDirDict) or AWS S3 (S3Dict). - Standard Dictionary API: Use
PersiDictobjects like standard Python dictionaries (__getitem__,__setitem__,__delitem__,keys,values,items). - Distributed Computing Ready: Designed for concurrent access in distributed environments.
- Flexible Serialization: Store values as pickles (
pkl), JSON (json), or plain text. - Type Safety: Optionally enforce that all values in a dictionary are instances of a specific class.
- Generic Type Parameters: Use
FileDirDict[MyClass]for static type checking with mypy/pyright. - Advanced Functionality: Includes features like write-once dictionaries, timestamping of entries, and tools for handling filesystem-safe keys.
- ETag-Based Conditional Operations: Optimistic concurrency helpers for conditional reads, writes, deletes, and transforms based on per-key ETags.
- Hierarchical Keys: Keys can be sequences of strings, creating a directory-like structure within the storage backend.
Use Cases
persidict is well-suited for a variety of applications, including:
- Caching: Store results of expensive computations and retrieve them later, even across different machines.
- Configuration Management: Manage application settings in a distributed environment, allowing for easy updates and access.
- Data Pipelines: Share data between different stages of a data processing pipeline.
- Distributed Task Queues: Store task definitions and results in a shared location.
- Memoization: Cache function call results in a persistent and distributed manner.
Usage
Storing Data on a Local Disk
The FileDirDict class saves your dictionary to a local folder.
Each key-value pair is stored as a separate file.
from persidict import FileDirDict
# Create a dictionary that will be stored in the "my_app_data" folder.
# The folder will be created automatically if it doesn't exist.
app_settings = FileDirDict(base_dir="my_app_data")
# Add and update items just like a regular dictionary.
app_settings["username"] = "alex"
app_settings["theme"] = "dark"
app_settings["notifications_enabled"] = True
# Values can be any pickleable Python object.
app_settings["recent_projects"] = ["project_a", "project_b"]
print(f"Current theme is: {app_settings['theme']}")
# >>> Current theme is: dark
# The data persists!
# If you run the script again or create a new dictionary object
# pointing to the same folder, the data will be there.
reloaded_settings = FileDirDict(base_dir="my_app_data")
print(f"Number of settings: {len(reloaded_settings)}")
# >>> Number of settings: 4
print("username" in reloaded_settings)
# >>> True
Storing Data in the Cloud (AWS S3)
For distributed applications, you can use S3Dict to store data in
an AWS S3 bucket. The usage is identical, allowing you to switch
between local and cloud storage with minimal code changes.
from persidict import S3Dict
# Create a dictionary that will be stored in an S3 bucket.
# The bucket will be created if it doesn't exist.
cloud_config = S3Dict(bucket_name="my-app-config-bucket")
# Use it just like a FileDirDict.
cloud_config["api_key"] = "ABC-123-XYZ"
cloud_config["timeout_seconds"] = 30
print(f"API Key: {cloud_config['api_key']}")
# >>> API Key: ABC-123-XYZ
Using Type Hints
persidict supports two complementary type safety mechanisms:
Static type checking with generic parameters (checked by mypy/pyright):
from persidict import FileDirDict
# Create a typed dictionary
d: FileDirDict[int] = FileDirDict(base_dir="./data")
d["count"] = 42
val: int = d["count"] # Type checker knows this is int
# Works with any PersiDict implementation
from persidict import LocalDict
cache: LocalDict[str] = LocalDict()
Runtime type enforcement with base_class_for_values (checked via isinstance):
d = FileDirDict(base_dir="./data", base_class_for_values=int)
d["count"] = 42 # OK
d["name"] = "Alice" # Raises TypeError at runtime
These mechanisms are kept separate because many type hints cannot be checked
at runtime. For example, Callable[[int], str], Literal["a", "b"],
TypedDict, and NewType have no isinstance equivalent. Use generics for
development-time safety; use base_class_for_values when you need runtime validation.
Conditional Operations
Use conditional operations to avoid lost updates in concurrent scenarios. The
insert-if-absent pattern uses ITEM_NOT_AVAILABLE with ETAG_IS_THE_SAME.
from persidict import FileDirDict, ITEM_NOT_AVAILABLE, ETAG_IS_THE_SAME
d = FileDirDict(base_dir="./data")
r = d.setdefault_if("token", default_value="v1", condition=ETAG_IS_THE_SAME, expected_etag=ITEM_NOT_AVAILABLE)
Comparison With Python Built-in Dictionaries
Similarities
PersiDict subclasses can be used like regular Python dictionaries, supporting:
- Get, set, and delete operations with square brackets (
[]). - Iteration over keys, values, and items.
- Membership testing with
in. - Length checking with
len(). - Standard methods like
keys(),values(),items(),get(),clear(),setdefault(), andupdate().
Differences
- Persistence: Data is saved between program executions.
- Keys: Keys must be URL/filename-safe strings or their sequences.
- Values: Values must be serializable in the chosen format (pickle, JSON, or text). You can also constrain values to a specific class.
- Order: Insertion order is not preserved.
- Additional Methods:
PersiDictprovides extra methods not in the standard dict API, such astimestamp(),etag(),random_key(),newest_keys(),subdicts(),discard(),get_params(), and more. - Conditional Operations: ETag-based compare-and-swap reads/writes with structured results (see Conditional Operations).
- Special Values: Use
KEEP_CURRENTto avoid updating a value andDELETE_CURRENTto delete a value during a write.
Glossary
Core Concepts
PersiDict: The abstract base class that defines the common interface for all persistent dictionaries in the package. It's the foundation upon which everything else is built.NonEmptyPersiDictKey: A type hint that specifies what can be used as a key in anyPersiDict. It can be aNonEmptySafeStrTuple, a single string, or a sequence of strings. When aPersiDictmethod requires a key as an input, it will accept any of these types and convert them to aNonEmptySafeStrTupleinternally.NonEmptySafeStrTuple: The core data structure for keys. It's an immutable, flat tuple of non-empty, URL/filename-safe strings, ensuring that keys are consistent and safe for various storage backends. When aPersiDictmethod returns a key, it will always be in this format.
Main Implementations
FileDirDict: A primary, concrete implementation ofPersiDictthat stores each key-value pair as a separate file in a local directory.S3Dict: The other primary implementation ofPersiDict, which stores each key-value pair as an object in an AWS S3 bucket, suitable for distributed environments.
Key Parameters
serialization_format: A key parameter forFileDirDictandS3Dictthat determines the serialization format used to store values. Common options are"pkl"(pickle) and"json". Any other value is treated as plain text for string storage.base_class_for_values: An optional parameter for anyPersiDictthat enforces type checking on all stored values, ensuring they are instances of a specific class.append_only: A boolean parameter that makes items inside aPersiDictimmutable, preventing them from modification or deletion.digest_len: An integer that specifies the length of a hash suffix added to key components inFileDirDictto prevent collisions on case-insensitive file systems.base_dir: A string specifying the directory path where aFileDirDictstores its files. ForS3Dict, this directory is used to cache files locally.bucket_name: A string specifying the name of the S3 bucket where anS3Dictstores its objects.region: An optional string specifying the AWS region for the S3 bucket.
Advanced and Supporting Classes
WriteOnceDict: A wrapper that enforces write-once behavior on anyPersiDict, ignoring subsequent writes to the same key. It also allows for random consistency checks to ensure subsequent writes to the same key always match the original value.OverlappingMultiDict: An advanced container that holds multiplePersiDictinstances sharing the same storage but with differentserialization_formats.LocalDict: An in-memoryPersiDictbacked by a RAM-only hierarchical store.EmptyDict: A minimal implementation ofPersiDictthat behaves
like a null device in the OS: accepts all writes, discards them, and returns nothing on reads. Always appears empty regardless of operations performed on it.
Special "Joker" Values
Joker: The base class for special command-like values that can be assigned to a key to trigger an action instead of storing a value.KEEP_CURRENT: A "joker" value that, when assigned to a key, ensures the existing value is not changed.DELETE_CURRENT: A "joker" value that deletes the key-value pair from the dictionary when assigned to a key.
ETags and Conditional Flags
ETagValue: Opaque per-key version string used for conditional operations.ETag conditions:ANY_ETAG(unconditional),ETAG_IS_THE_SAME(expected == actual),ETAG_HAS_CHANGED(expected != actual).ITEM_NOT_AVAILABLE: Sentinel used when a key is missing (stands in for the ETag).VALUE_NOT_RETRIEVED: Sentinel indicating a value exists but was not fetched.
API Highlights
PersiDict subclasses support the standard Python dictionary API, plus these additional methods:
| Method | Return Type | Description |
|---|---|---|
timestamp(key) |
float |
Returns the POSIX timestamp (seconds since epoch) of a key's last modification. |
random_key() |
SafeStrTuple | None |
Selects and returns a single random key, useful for sampling from the dataset. |
oldest_keys(max_n=None) |
list[SafeStrTuple] |
Returns a list of keys sorted by their modification time, from oldest to newest. |
newest_keys(max_n=None) |
list[SafeStrTuple] |
Returns a list of keys sorted by their modification time, from newest to oldest. |
oldest_values(max_n=None) |
list[Any] |
Returns a list of values corresponding to the oldest keys. |
newest_values(max_n=None) |
list[Any] |
Returns a list of values corresponding to the newest keys. |
get_subdict(prefix_key) |
PersiDict |
Returns a new PersiDict instance that provides a view into a subset of keys sharing a common prefix. |
subdicts() |
dict[str, PersiDict] |
Returns a dictionary mapping all first-level key prefixes to their corresponding sub-dictionary views. |
discard(key) |
bool |
Deletes a key-value pair if it exists and returns True; otherwise, returns False. |
get_params() |
dict |
Returns a dictionary of the instance's configuration parameters, supporting the mixinforge API. |
Conditional Operations (ETag-based)
PersiDict exposes explicit conditional operations for optimistic concurrency.
Each key has an ETag; missing keys use ITEM_NOT_AVAILABLE. Conditions are
ANY_ETAG (unconditional), ETAG_IS_THE_SAME (expected == actual), and
ETAG_HAS_CHANGED (expected != actual). Methods return a structured result
with whether the condition was satisfied, the actual ETag, the resulting ETag,
and the resulting value (or VALUE_NOT_RETRIEVED when value retrieval is
skipped).
Common methods and flags:
| Item | Kind | Notes |
|---|---|---|
get_item_if(key, *, condition, expected_etag, retrieve_value=IF_ETAG_CHANGED) |
Method | Conditional read. |
set_item_if(key, *, value, condition, expected_etag, retrieve_value=IF_ETAG_CHANGED) |
Method | Supports KEEP_CURRENT and DELETE_CURRENT. |
setdefault_if(key, *, default_value, condition, expected_etag, retrieve_value=IF_ETAG_CHANGED) |
Method | Insert-if-absent. |
discard_if(key, *, condition, expected_etag) |
Method | Conditional delete. |
transform_item(key, *, transformer, n_retries=6) |
Method | Retry loop for read-modify-write. |
ETagValue |
Type | NewType over str. |
ITEM_NOT_AVAILABLE |
Sentinel | Missing key marker. |
VALUE_NOT_RETRIEVED |
Sentinel | Value exists but was not fetched. |
Example: compare-and-swap loop
from persidict import FileDirDict, ANY_ETAG, ETAG_IS_THE_SAME, ITEM_NOT_AVAILABLE
d = FileDirDict(base_dir="./data")
while True:
r = d.get_item_if("count", condition=ANY_ETAG, expected_etag=ITEM_NOT_AVAILABLE)
new_value = 1 if r.new_value is ITEM_NOT_AVAILABLE else r.new_value + 1
r2 = d.set_item_if("count", value=new_value, condition=ETAG_IS_THE_SAME, expected_etag=r.actual_etag)
if r2.condition_was_satisfied:
break
Installation
The source code is hosted on GitHub at: https://github.com/pythagoras-dev/persidict
Binary installers for the latest released version are available at the Python package index at: https://pypi.org/project/persidict
You can install persidict using pip or your favorite package manager:
pip install persidict
To include the AWS S3 extra dependencies:
pip install persidict[aws]
For development, including test dependencies:
pip install persidict[dev]
Project Statistics
| Metric | Main code | Unit Tests | Total |
|---|---|---|---|
| Lines Of Code (LOC) | 7471 | 20500 | 27971 |
| Source Lines Of Code (SLOC) | 3303 | 13380 | 16683 |
| Classes | 37 | 40 | 77 |
| Functions / Methods | 296 | 1191 | 1487 |
| Files | 17 | 136 | 153 |
Contributing
Contributions are welcome! Please see the contributing guide for more details on how to get started, run tests, and submit pull requests.
For guidance on code quality, refer to:
License
persidict is licensed under the MIT License. See the LICENSE file for more details.
Key Contacts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persidict-0.309.0.tar.gz.
File metadata
- Download URL: persidict-0.309.0.tar.gz
- Upload date:
- Size: 207.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
388d5d92a607d9b2363f2830c9d62c15285409411cac57377508587156a66295
|
|
| MD5 |
3822a0cc7c4d470c37be1f9a4db172c3
|
|
| BLAKE2b-256 |
16bc378d57e207a2f2a08e4fd782a088ee29080e8331852c34d965740a50c8db
|
File details
Details for the file persidict-0.309.0-py3-none-any.whl.
File metadata
- Download URL: persidict-0.309.0-py3-none-any.whl
- Upload date:
- Size: 79.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6362d30f8ed2c6afe31e8902c4caa0d3a54847d921d04aef0daf73c1aabf9a2c
|
|
| MD5 |
131ea5e1ed6c9f24f7ca14f73cdbfecf
|
|
| BLAKE2b-256 |
72ab64f6c00d21e6586d6ea272c3ecd1883a079281ecbd2e6dde4a5fc9884374
|