Skip to main content

Concurrent appendable key-value storage

Project description

Minimal key-value byte storage with appendable values

Partd stores key-value pairs. The values are raw bytes. We append onto existing values.

Partd is useful for shuffling operations.

API

  1. Create a Partd:

    >>> import partd
    >>> p = partd.File('/path/to/new/dataset/'')
  2. Append key-byte pairs to dataset:

    >>> p.append({'x': b'Hello ', 'y': b'123'})
    >>> p.append({'x': b'world!', 'y': b'456'})
  3. Get all bytes associated to a set of keys:

    >>> p.get(['y', 'x'])
    [b'123456', b'Hello world!']
  4. Idempotently set single key-value pair (no append, no update):

    >>> p.iset('z', b'metadata'])
  1. Destroy partd dataset:

    >>> p.drop()

That’s it.

There is no in-memory state.

Implementations

The reference implementation uses file-based locks. This works surprisingly well as long as you don’t do many small writes.

If you do many small writes then you probably want to cache in memory; this is hard to do in parallel while also maintaining consistency. For this we have a centralized server (see partd.Shared) that caches data in memory and writes only large chunks to disk when necessary

  • Server Process:

    >>> server = p.Server('/path/to/dataset', 'ipc://server')
  • Worker processes:

    >>> p = Shared('ipc://server')
    >>> p.append(...)

Encodings and Compression

Once we can robustly and efficiently append bytes we move on to encoding various things as bytes either with serialization systems like Pickle or MSGPack or with compression routines like zlib, snappy, or blosc. In principle we want to compose all of these choice together

  1. Write policy: partd.File, partd.Shared

  2. Encoding: partd.Pickle, partd.Numpy

  3. Compression: partd.Blosc, partd.Snappy, …

Partd objects compose by nesting for example here we make a shared server that writes snappy compressed numpy arrays:

>>> p = partd.Numpy(partd.Snappy(partd.Shared('foo')))

And here a partd that writes pickle encoded BZ2 compressed bytes directly to disk:

>>> p = partd.Pickle(partd.BZ2(partd.File('foo')))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

partd-0.2.2.tar.gz (14.3 kB view details)

Uploaded Source

File details

Details for the file partd-0.2.2.tar.gz.

File metadata

  • Download URL: partd-0.2.2.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for partd-0.2.2.tar.gz
Algorithm Hash digest
SHA256 350a6ad36252d571e2aa3a116e4e365c14af9400f9c6a82428e34658ede779d5
MD5 94f227253b5aa1064db363760f460e37
BLAKE2b-256 1cc20783c7b31eb0399ddbb94e74d39b6e8513e1e7c16f3937050c724720c2c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page