Skip to main content

Simple Object Storage - Persistent dicts and lists for python.

Project description

pySOS: Simple Objects Storage

persistant dictionaries and lists for python

This is ideal for lists or dictionaries which either need persistence, are too big to fit in memory or both.

There are existing alternatives like shelve, which are very good too. There main difference with pysos is that:

  • only the index is kept in memory, not the values (so you can hold more data than what would fit in memory)
  • it provides both persistent dicts and lists
  • objects must be json "dumpable" (no cyclic references, etc.)
  • it's fast (much faster than shelve on windows, but slightly slower than native dbms on linux)
  • it's unbuffered by design: when the function returns, you are sure it has been written on disk
  • it's safe: even if the machine crashes in the middle of a big write, data will not be corrupted
  • it is platform independent, unlike shelve which relies on an underlying dbm implementation, which may vary from system to system
  • the data is stored in a plain text format

Usage

pip install pysos

Dictionaries:

import pysos
db = pysos.Dict('somefile')
db['hello'] = 'persistence!'

Lists:

import pysos
db = pysos.List('somefile')
db.append('it is now saved in the file')

Performance

Just to give a ballpark figure, there is a mini benchmark included in test_benchmark.py. Here are the results on my laptop:

Writes: 28521 / second
Reads: 188502 / second

The test is just writing 100k small key/values, and reading them all too. It's just meant to give a rough idea.

It writes every time you set a value, but only the key/value pair. So the cost of adding an item stays constant. On the other hand, lots of updates / deletes / re-inserts would lead to data fragmentation in the file. This might deteriorate performance in the long run.

F.A.Q.

Is it thread safe?

No. It's not thread safe. In practice, synchronization mechanisms are typically desired on a higher level anyway.

Why not make it async writes?

In the original version, there was a switch to choose between sync and async mode. However, it turned out to have only a relatively small impact on overall performance. Less than 25% on the hardware/OS/data I tested if I remember right. Since the benefits seem rather low, I removed the flag and the associated code altogether, in order to ensure safety by default. IMHO, it's preferable to loose a few microseconds rather than data upon a crash.

Why not use memory mapped files?

I experimented with that too. In my experience, with the hardware/OS/data I tested, it turned out to ...suck. Using memory mapped files lead to inconsistent and unpredictible performance, often much slower than direct file access.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysos-1.3.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

pysos-1.3.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file pysos-1.3.0.tar.gz.

File metadata

  • Download URL: pysos-1.3.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for pysos-1.3.0.tar.gz
Algorithm Hash digest
SHA256 4993c197482afcfec9d0549b110578c96181e3dc4a264aa68b1e49e768436c8f
MD5 df9cd899b6aa4283d2a6a89ac05b21bd
BLAKE2b-256 5dbb43b90c2f743e958f66af0909e09f8d0256ff305c02219c977a8bb9b6ae52

See more details on using hashes here.

File details

Details for the file pysos-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: pysos-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for pysos-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c526763b6a238115fea141fb043439373e0b78ad5693efdb4969185f4b6b5c6b
MD5 98518ef237fa996bfc51cb6300ae68c7
BLAKE2b-256 6ec80d0d6aac6ef9889e66e41083834210cccb0482c8d3113190c162522e151d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page