Library for simple key-value storage
Project description
NHKV
NHKV (no hassle key-value) is a library for on-disk key-value storage. The use case is primarily directed towards storing large objects. The goal was to make it lightweight and without external dependencies. Was created primarily for storing datasets for machine learning.
Installation
pip install nhrv
Quick Start
from nhkv import KVStore
storage = KVStore("path/to/storage/location") # a folder is created
key = 0
value = "python serializable object"
storage[key] = value
storage.save()
Available Storage Classes
DbDict
Data is stored in Sqlite3 database. THe idea id similar to SqliteDict
. Functionality of DbDict
is inferior to SqliteDict
, but overhead is also smaller.
from nhkv import DbDict
storage = DbDict("path/to/storage/location", key_type=str) # a database file is created
# can also use int keys
storage["string key"] = "python serializable object"
storage.save() # database transaction is not completed until saved explicitly
# frequent saving affects performance
CompactKeyValueStore
The data is kept in mmap file. The index is kept in memory, can become very large if many objects are stored. Mmap files are split in shards.
from nhkv import CompactKeyValueStore
storage = CompactKeyValueStore(
"path/to/storage/location", # folder is created
serializer=lambda string_: string_.encode("utf8"), # optional serializer
deserializer=lambda bytes_: bytes_.decode("utf8"), # optional deserializer
shard_size=1048576 # optional shard size in bytes
)
# key is any type that can be used with `dict`
storage["string key"] = "python serializable object"
storage.save() # save to ensure transaction is complete
# frequent saving affects performance
storage.close()
storage = CompactKeyValueStore.load("path/to/storage/location")
KVStore
The data is kept in mmap file. The index is kept either in sqlite or shelve database.
from nhkv import KVStore
storage = KVStore(
"path/to/storage/location", # folder is created
index_backend='sqlite', # possible options: sqlite | shelve
serializer=lambda string_: string_.encode("utf8"), # optional serializer
deserializer=lambda bytes_: bytes_.decode("utf8"), # optional deserializer
shard_size=1048576 # optional shard size in bytes
)
# sqlite uses int keys
# shelve uses str keys
storage[100] = "python serializable object"
storage.save() # save to ensure transaction is complete
# frequent saving affects performance
storage.close()
storage = KVStore.load("path/to/storage/location")
Alternatives
NHKV is closely related to libraries such as
Shelve
- has non-zero probability of key collision, very slowSqliteDict
- slower reads and writes, but more functionality (eg. multiprocessing)DiskCache
- slower reads and writes, but more functionality (eg. cache features)
Limitation
Storage classes in this library are better suited for the batch writes and consecutive batch reads. This represents the intended use case: storing datasets for machine learning. Alternating many reads and writes will result in reduced performance.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.