This python module implements a simple key-value data base based on JSON data stored in a directory.
Project description
jk_keyvaluestore
Introduction
This python module implements a simple key-value data base based on JSON data stored in a directory.
Information about this module can be found here:
Goals
This module provides a simple key-value-based data store. The current implementation does not aim for highes performance but for ease of use, even in interprocess domain. This means that ...
- You can instantiate a data store class right away and use it to read and write key-value data. The only thing you need to provide is a writable data directory.
- A data store can be read-write or read-only.
- A key can be associated with arbitrary amounts of JSON data.
- Other data store instances (sharing the same directory) can perform writes as well. These will be visible to the other data store instances.
Limitations
The current implementation is pretty straight forward. Every change causes a write of a JSON file that contains that data. This way a) persistency is ensured and b) other instances of a data store will receive the new data. Therefore the concurrency model is very simple: THe most recent write wins.
Though this concept has drawbacks:
- Managing thousands of key-value-pairs will be inefficient as for every key-value-pair a single file is required to hold that data.
- High frequent changes of data will be inefficient as well.
- As time stamps are used to synchronize one or more data store instances if you use multiple processes residing on multiple hosts you need to ensure that the system time will not be much different accross all nodes.
This key-value store is ment for use cases where you desire to keep things simple in your software as the amount of data you need to manage is quite limited.
Use cases
Use cases for this implementation is:
- Write out state information of an application and monitor it
- Implement simple interprocess communication
- Modify configuration data of an application while the application is running
Benefits of this approach
Nevertheless this approach has a variety of benefits:
- Portability: If you (would) require the cooperation of programs written in different languages this can be achieved very easily. Porting this python implementation to other programming languages would be very simple.
How to use this module
Import this module
Please include this module into your application using the following code:
from jk_keyvaluestore import DirBasedKeyValueStore
Importing this way is recommended as currently DirBasedKeyValueStore
is the only class. (This might change in the future.)
Using a single instance of the data store
Create an instance
To set up a data store instance is easy:
ds = DirBasedKeyValueStore(dirPath="my/data/directory", identifier=1)
For clarification all arguments have been named in this example (in their order of declaration).
dirPath
must refer to an existing directory that will hold the data. If multiple instances are created they must refer to the same directory.- If this is not a read only instance
identifier
must be either a valid string or a valid integer value. If it is a string the identifier is matched againstr[a-zA-Z0-9_+-\.]+
. If it is an integer the value must be greater or equal to zero. If you specifyNone
(which is the default) the data store will be read only.
Write data
Writing data is very simple. Example:
ds.put("someKey", [ "some", "value" ])
or:
ds["someKey"] = [ "some", "value" ]
The keys must be strings of arbitrary size. Values can be anything that is storable in JSON.
Read data
Reading data is very simple. Example:
value = ds.get("someKey")
or:
value = ds["someKey"]
This operation will return None
by default if the key does not exist. (No exception will be raised.)
Check if a key exists
Performing a check for existance of a key in the data store is easy. Example:
bKeyValuePairExists = ds.contains("someKey")
Please note that the data store will maintain a list of all deleted entries to be able to still synchronize such information with other instances. The data returned by contains()
will not reflect deleted entries. Example:
ds.put("someKey", "someValue")
ds.delete("someKey")
assert ds.contains("someKey") == False
Delete a single key value pair
Deleting data is very simple. Example:
ds.remove("someKey")
or:
del ds["someKey"]
This operation is indempotent. No exception will be raised if the key has already been deleted.
Please note that in ordert to maintain synchronization capabilities with other data stores information about this delete will be kept internally in the data store.
Delete all key value pairs
Removing all data from a data store is simple. Example:
ds.clear()
This operation is indempotent. No exception will be raised if the data store is already empty.
Please note that in order to maintain synchronization capabilities with other data stores information about this delete will be kept internally in the data store.
Get a list of all keys
It is possible to get a list of all keys currently in use in the data store. Example:
allKeys = ds.keys()
The methhod keys()
will return a list of keys.
Please note that the data store will maintain a list of all deleted entries. The data returned by keys()
will not contain deleted entries.
Using a multipe instances of the data store
Instantiation
Using multiple instances is easy. Example:
# In program A
dsA = DirBasedKeyValueStore(dirPath="my/data/directory", identifier=1)
# In program B
dsB = DirBasedKeyValueStore(dirPath="my/data/directory", identifier=2)
Please note that both instances must be distinguishable. Therefore you need to provide unique identifiers for each instance matching the regular expression r[a-zA-Z0-9_+-\.]+
. (If you specify an identifier of None
you will create a read only instance.)
After this you can use get()
, put()
, remove()
and other methods.
Synchronization
If intermediate changes occurred in other instances of a data store these changes are not synchronized automatically to other data store. Synchronization is entirely up to you: As synchronization is not cheap you as the developer has to decide when a synchronization should be performed.
During synchronization the directory of files storing the key-value pairs is scanned for new entries. If new entries are found they will get loaded. The information contained in these files will then be incorporated into the current data store instance. As this is directly dependend on the number of files written since the last time synchronization has been performed, this might be a bit costly. Therefore it is up to you as a developer to decide when exactly these synchronizations should occur.
Performance
As files are read and written in order to manage all data the time required for file operations is the limiting factor. Here are some performance values based on experience in Python on Linux with a regular SATA-SSD:
Operation | Performance |
---|---|
get | Limited by python interpreter performance only. |
put | ~15ms |
synchronize single change | ~10ms |
As during instantiation synchronize()
is called to read all existing data, instantiation performance is very similar to synchronization performance.
(Possible) Future Development
The current implementation is based on synchroneous I/O operations. Future implementation options therefore would be:
- provide an implementation based on asynchroneous I/O
- provide an implementation for C, Java, C#, maybe even JavaScript
- maybe provide an implementation accessing data via SFTP
- explore even better ways of implementing such a data store:
- using named pipes to communicate to a single process
Other things to do:
- test the current implementation on an NFS share
If you would be interested in improving or porting the current implementation you're welcome. Feel free to contact me.
Contact Information
This is Open Source code. That not only gives you the possibility of freely using this code it also allows you to contribute. Feel free to contact the author(s) of this software listed below, either for comments, collaboration requests, suggestions for improvement or reporting bugs:
- Jürgen Knauth: jknauth@uni-goettingen.de, pubsrc@binary-overflow.de
License
This software is provided under the following license:
- Apache Software License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file jk_keyvaluestore-0.2019.9.17.1.tar.gz
.
File metadata
- Download URL: jk_keyvaluestore-0.2019.9.17.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2eecd8f5a7c938a712f8d6426967a886f19fd1af92decbf36e8f36b71c6dbb48 |
|
MD5 | 7f05a1d5924b210b2f1defcca16564e4 |
|
BLAKE2b-256 | 5f2fcc0b8acd017cc414b8e2c3cc6c48fe317e18d29f8db94316acf70cce4580 |