Remote dictionary backed up by cloud services
Project description
REMOTE DICT
RemoteDict is a Python library intended to host a dictionary in a cloud backend. Currently it is supported Azure Blob Storage.
USAGE - AZURE
Grab a CONNECTION_STRING
from your azure blob storage, then import the AzureDictionary
as follows:
>>> from remotedict.azure import AzureDictionary
>>> remote_dict = AzureDictionary(CONNECTION_STRING, container_name="mycontainer", folder_name="myfolder")
>>> remote_dict
Azure Blob Storage. Container: "mycontainer"; Folder: "myfolder"; Num elements: 0
>>> remote_dict["foo"] = "bar"
>>> remote_dict["foo"]
bar
remote_dict
is an object that behaves like a python dictionary. However, it contains extensive functionality useful to dealing with large data and concurrency.
HOW IT WORKS
Once remote_dict
is instantiated, it can be used to store any kind of data:
remote_dict['example'] = "hello"
remote_dict['example2'] = b"binary data of any size"
remote_dict['example3'] = 42
remote_dict['example4'] = {"this": {"is": b"a subdictionary", "that": "holds", "any": True, "data": 42}}
remote_dict['example5'] = ["even", "lists", "or", "numpy", "and", "pandas"]
remote_dict['example6'] = np.random.randn(10, 3, 1)
remote_dict['example7'] = pd.DataFrame([1,2,3,4])
It allows to shorten the assignments and readings in atomic operations:
remote_dict[[
'example',
'example2',
'example3'
]] = "hello", b"binary data", 42
Each entry is stored as an LZ4-compressed binary in a single file inside the specified container and folder during instantiation of remote_dict
.
There are no soft limits in the size a value can have.
INDEXES
In RemoteDict, there is a concept of Index which allows to retrieve all the keys instantly without iterating the backend for elements.
This index functionality is achieved by using a file as an indexer, ensuring that concurrency can't break it by using cloud leases on the file.
For this reason, the folder "Index" in the cloud container is reserved and handled automatically by RemoteDict.
Rather than downloading the index file each time an index check is required, the class only checks the etag
of the file (which is faster than downloading it). If the etag
does not match the local etag
, the index is redownloaded ensuring to be always up-to-date.
The index is a pd.Series
object that can be accessed as follows:
>>> remote_dict.index
example example/example
example2 example/example2
example3 example/example3
example4 example/example4
example5 example/example5
example6 example/example6
example7 example/example7
Name: name, dtype: object
CONCURRENT ACCESS
Concurrent readings are allowed by nature, however, concurrent writes are a bit more complex. RemoteDict
deals with concurrency by allowing to acquire leases on custom elements.
Example to lock an element:
>>> remote_dict.lock_item("example", duration=15) # duration in seconds
Once the element is locked, no other remote_dict
(anywhere, even different machines) can lock or write to this item again unless the item is manually unlocked or the duration expires.
If another element tries to lock it, remote_dict
will wait for it to be released (default behaviour) or raise an exception if wait=False
.
It can only be written by this object as long as the lease is conceived.
To unlock the element:
>>> remote_dict.unlock_item("example")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file remotedict-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: remotedict-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e95405318ee6d8c91c4f233ba452c5b4fa4a406bdef35398ac989041e7ea82bc |
|
MD5 | a6a1c50bd22e5c15785b351a420f4789 |
|
BLAKE2b-256 | c3c3e9c282e1bfb3fce18d5106b117c42b4dd07de6171e7b0b6d1cea6d0ebe58 |