Skip to main content

File-based key-value storage for pickle-serializable keys and values.

Project description

pickledir

File-based key-value storage.

Keys and values are serialized with pickle. Data is kept in files in the specified directory.

CI-tested with Python 3.8-3.9 on macOS, Ubuntu and Windows.


The storage has zero initialization time, fast random access, fast reads and writes.

Unlike shelve, the data saved by PickleDir is cross-platform: you can write it on Linux and read the same files on Windows. Unlike most database-based caching solutions (including the shelve), the PickleDir does not require the "open" and "close" storage. It's always open since it's just a directory in the file system.

PickleDir is better for casual data storage. Database-based solutions are preferred when your storage has many elements (3 thousand or more). They will also be faster when working with a predictably high load in terms of reading and writing.

Install

$ pip3 install pickledir

Use

Create

from pickledir import PickleDir

cache = PickleDir('path/to/my_cache_dir')

Write

Keys do not need to be hashable. They only need to be serializable with pickle .

When you assign a value, the data is literally written to a file.

cache['key'] = 'hello, user!'
cache[5] = 23
cache[{'a', 'b', 'c'}] = 'abc'

Read

print(cache['key'])
print(cache[5])
print(cache[{'a', 'b', 'c'}])

Read all values

for key, value in cache.items():
    print(key, value)

Delete item

del cache['key']

Type hints

# declaring PickleDir with string keys and integer values:

cache: PickleDir[str, int] = PickleDir('path/to/my_cache_dir')

Set expiration time on writing

The expired items will be removed from the storage.

cache.set('a', 1000, max_age = datetime.timedelta(seconds=1))
print(cache.get('a'))  # 1000
time.sleep(2)     
print(cache.get('a'))  # None (and removed from storage)

Set expiration time on reading

The expired items will not be returned, but kept in the storage.

cache['b'] = 1000
time.sleep(2)
cache.get('b' max_age = datetime.timedelta(seconds=1)) # None
cache.get('b' max_age = datetime.timedelta(seconds=9)) # 1000

Set data version

Setting the data version makes it easy to mark old data as obsolete.

For example, you cached the result of a function, and then changed the implementation of that function. In this case, there is no need to delete old files from the cache. Just change the version number.

cache = PickleDir('path/to/dir', version=1)
cache['a'] = 'some_data'

You can read all stored data while the version value is 1.

cache = PickleDir('path/to/dir', version=1)
print(cache.get('a'))  # 'some_data'

If you decide that all the data in the cache is out of date, just pass the constructor a version number that you haven't used before.

cache = PickleDir('path/to/dir', version=2)
print(cache.get('a'))  # None

Now all that is saved with version 2 is actual data. Any other version is considered obsolete and will be gradually removed.

Do not create the PickleDir with an old version number. It will make the data unpredictable.

cacheV1 = PickleDir('path/to/dir', version=1)  # ok
cacheV1['a'] = 'old A'
cacheV1['b'] = 'old B'

cacheV2 = PickleDir('path/to/dir', version=2)  # ok
cacheV2['a'] = 'new A'

cacheV1 = PickleDir('path/to/dir', version=1)  # don't do this
print(cacheV1.get('b'))  # Schrödinger's data ('old B' or None)

Benchmarks

Casually saving 10 items and reading them again:

for i in range(10):
    cache[str(i)] = {"data": i, "other": None}
for i in range(10):
    _ = cache[str(i)]
Storage Time
PickleDir 0.42
shelve 6.68
diskcache.Cache 1.09

Measured on macOS, Python 3.8, SATA HDD (not SSD), Journaled HFS+.

See sources in benchmark dir.

The main advantage of pickledir is the lack of time required to create a database or initialize tables. If we did not save 10 items, but 1000 in a row, shelve and diskcache would be faster than pickledir.

Under the hood

Serialized data is stored inside files in the same directory. Each file contains one or more items. The maximum number of files is limited to 4096. The values are uniformly distributed between the files.

Reading is slower when a file contains more than one item. Therefore, the PickleDir is better suited for cases with the number of items within a few thousand.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pickledir-0.3.5.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

pickledir-0.3.5-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pickledir-0.3.5.tar.gz.

File metadata

  • Download URL: pickledir-0.3.5.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for pickledir-0.3.5.tar.gz
Algorithm Hash digest
SHA256 7d3b872bb5c2ac1d2d796eba6540a00d2516db3cae0947226ff921f729f2fe56
MD5 1c491032dbcdc474eb312637647364c3
BLAKE2b-256 1de2332ad65bb02bdac788af7ff8d837d8e38a2ac3f77e887ff1ee64c58ab579

See more details on using hashes here.

File details

Details for the file pickledir-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: pickledir-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for pickledir-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 11c4d98c729adf0bbd441ffe91bb75a819de332fc89e2b4d66915cdbf7f637af
MD5 a57346559645e483466e352eda91a063
BLAKE2b-256 5b1cba207018db3fdcca2a9af1eb784082e5d9e0e6773cc9611ac996a5129a9b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page