Skip to main content

Looks like a dict and acts like a dict but is persistent via an sqlite3 db, like sqldict

Project description

PersistDict

Just a DIY version sqldict: looks like a dict and acts like a dict but is persistent via an sqlite3 db.

Why?

I ran into issue with langchain's caches when developping wdoc (my RAG lib, optimized for my use) and after months of waiting I decided to fix it myself. And instead of trusting sqldict's implementation with langchain's concurrency I made my own. This makes it very easy to add persistent cache to anything. Also it was easy to do thanks to my BrownieCutter. Note: after making this I stumbled upon lmdb-dict which is very probably way better as it's done by pros. It's based on LMDB which is a more suitable for what I was after when doing PersistDict than sqlite3.

Features:

  • threadsafe: if several threads try to access the same db it won't be a problem. Even if multiple other threads use also another db. Thanks to a singleton class. And if several python scripts run at the same time and try to access the same db, sqlite3 should make them wait appropriately.
  • atime and ctime: each entry includes a creation time and a last access time.
  • expiration: won't grow too large because old keys are automatically removed.
  • cached: an actual python dict is used to cache the access to the db. This cache is shared among instances, and dropped if another scripts uses the same db.
  • compression: using the builtin sqlite3 compression.
  • customizable serializer for the value: by default pickle is used, but could be numpy.npz, joblib.dumps, dill.dumps etc
  • encryption: unsing the UNMAINTAINED library pysqlcipher3, because it was very easy to add. In the future will use an up to date library and encrypt value in place directly.
  • no dependencies needed If you have beartype installed it will be used, same with loguru. Encryption comes from the UNMAINTAINED pysqlcipher3 lib. For now as I plan to move on to a simple in place encryption instead.

Differences with python dict:

  • keys have to be str, that's what the sqlite db table is expecting.
  • an object stored at self.missing_value is used to designate a MISSING value, so you can't pickle this object. By default it's dataclasses.MISSING.
  • .clear() will throw a NotImplementedError to avoid erasing the db. If you just want to clear the cache use self.clear_cache(). If you actually want to remove all data you can do self.__delitems__(list(self.keys())).
  • add 3 methods to 'slice' the dict with multiple key/values:
    • .__getitems__
    • .__setitems__
    • .__delitems__
    • Note that calling getitems with some keys missing will not return a KeyError but a self.missing_value for those keys, which by default is dataclasses.MISSING.

Usage:

  • Download from pypi with pip install PersistDict
  • Or from git:
    • git clone https://github.com/thiswillbeyourgithub/PersistDict
    • cd PersistDict
    • pip install -e .
    • To test that the code works fine: cd PersistDict ; python PersistDict.py
from PersistDict import PersistDict

# create the object
d = PersistDict(
    database_path=a_path,
    # compression=True,
    # password="J4mesB0nd",
    # verbose=True,
    # expiration_days=30,
)
# then treat it like a dict:
d["a"] = 1

# You can even create it via __call__, like a dict:
# d = d(a=1, b="b", c=str)  # this actually calls __call__ but is only
# allowed once per SqlDict, just like regular dict

# it's a child from dict
assert isinstance(d, dict)

# prints like a dict
print(d)
# {'a': 1, 'b': 'b', 'c': str}

# Supports the same methodas dict
assert sorted(list(d.keys())) == ["a", "b", "c"], d
assert "b" in d
del d["b"]
assert list(d.keys()) == ["a", "c"], d
assert len(d) == 2, d
assert d.__repr__() == {"a": 1, "c": str}.__repr__()
assert d.__str__() == {"a": 1, "c": str}.__str__()

# supports all the same types as value as pickle (or more if you change
# the serializer)
d["d"] = None

# new method to get and set multiple elements at the same time
assert d.__getitems__(["c", "d", "a"]) == [str, None, 1]

d.__setitems__(( ("a", 1), ("b", 2), ("c", 3), ('d', 4)))
assert d.__getitems__(["c", "d", "a", "b"]) == [3, 4, 1, 2], d.__getitems__(["c", "d", "a", "b"])

d.__delitems__(["c", "a"])
assert d.__getitems__(["b", "d"]) == [2, 4], d
assert len(d) == 2, d

# If you create another object pointing at the same db, they will share the
# same cache and won't corrupt the db:
d2 = SQLiteDict(
database_path=dbp,
compression=compr,
password=pw,
verbose=True,
)
list(d.keys()) == list(d2.keys()), d2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persistdict-0.1.4.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

PersistDict-0.1.4-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file persistdict-0.1.4.tar.gz.

File metadata

  • Download URL: persistdict-0.1.4.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.20

File hashes

Hashes for persistdict-0.1.4.tar.gz
Algorithm Hash digest
SHA256 8dd4a1c56064c2f2676c4d8bc4372c66fb7ec7e8e548ca4e2af6b4934a65b339
MD5 73ca0b60fa4aae6bb1e4151edcfd3114
BLAKE2b-256 3ed16bc3e801f3912af25f3101450dd805648bbb754fb1f5c8a60c5f425f9ad0

See more details on using hashes here.

File details

Details for the file PersistDict-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: PersistDict-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.20

File hashes

Hashes for PersistDict-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c47137442a1ea7abf7d9b19c7710af173c165834e730843b1fd4cb5b8e675a33
MD5 c030999ba83e873e95da16930759f287
BLAKE2b-256 61e6c422a5c0cc6fbea4b112ac1822aa041cdca7274b5cf49764c804424cda69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page