Skip to main content

Serializable map of integers to bytes with near zero parsing.

Project description

mapbuffer

Serializable map of integers to bytes with near zero parsing.

from mapbuffer import MapBuffer

data = { 2848: b'abc', 12939: b'123' }
mb = MapBuffer(data)

with open("data.mb", "wb") as f:
    f.write(mb.tobytes())

with open("data.mb", "rb") as f:
    binary = f.read()

mb = MapBuffer(binary)
print(mb[2848]) # fast: almost zero parsing required

>>> b'abc'

# assume data are a set of gzipped utf8 encoded strings
mb = MapBuffer(binary, 
    compress="gzip",
    frombytesfn=lambda x: x.decode("utf8")
)
print(mb[2848])
>>> "abc" # bytes were automatically decoded

MapBuffer is designed to allow you to store dictionaries mapping integers to binary buffers in a serialized format and then read that back in and use it without requiring an expensive parse of the entire dictionary. Instead, if you have a dictionary containing thousands of keys, but only need a few items from it you can extract them rapidly.

This serialization format was designed to solve a performance problem with our pipeline for merging skeleton fragments from a large dense image segmentation. The 3D image was carved up into a grid and each gridpoint generated potentially thousands of skeletons which were written into a single pickle file. Since an individual segmentation could cross many gridpoints, fusion across many files is required, but each file contains many irrelevant skeleton fragments for a given operation. In one measurement, pickle.loads was taking 68% of the processing time for an operation that was taking two weeks to run on hundreds of cores.

Therefore, this method was developed to skip parsing the dictionaries and rapidly extract skeleton fragments.

Design

The byte string format consists of a 16 byte header, an index, and a series of (possibily individually compressed) serialized objects.

HEADER|INDEX|DATA_REGION

Header

b'mapbufr' (7b)|FORMAT_VERSION (uint8)|COMPRESSION_TYPE (4b)|INDEX_SIZE (uint32)

Valid compression types: b'none', b'gzip', b'00br', b'zstd', b'lzma'

Example: b'mapbufr\x00gzip\x00\x00\x04\x00' meaning version 0 format, gzip compressed, 1024 keys.

Index

<uint64*>[ label, offset, label, offset, label, offset, ... ]

The index is an array of label and offset pairs (both uint64) that tell you where in the byte stream to start reading. The read length can be determined by referencing the next offset which are guaranteed to be in ascending order. The labels however, are written in Eyztinger order to enable cache-aware binary search.

The index can be consulted by conducting an Eyztinger binary search over the labels to find the correct offset.

Data Region

The data objects are serialized to bytes and compressed individually if the header indicates they should be. They are then concatenated in the same order the index specifies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapbuffer-0.1.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mapbuffer-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file mapbuffer-0.1.0.tar.gz.

File metadata

  • Download URL: mapbuffer-0.1.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4164c0ef6149a95aecd6c345c3b456583e681b21e0d5e6649d92b8b4bee7ba69
MD5 606d0ca851998131c87c9199f6f371ae
BLAKE2b-256 11b057c9fd869a44e9bed96c0017bf56b7b0650c8800e09c1aa56f7ac4be1b42

See more details on using hashes here.

File details

Details for the file mapbuffer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mapbuffer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d445765127e1e758a133923cf546a45f2e5b798e9f8d29acd460d6d0d2e890c0
MD5 048ec539145db5aca99c28ede19d8e3f
BLAKE2b-256 1f79540415e00ef788a8bab96551be6dff5cbc973a13faf38709f5702f0f4665

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page