Skip to main content

Serializable map of integers to bytes with near zero parsing.

Project description

PyPI version

mapbuffer

Serializable map of integers to bytes with near zero parsing.

from mapbuffer import MapBuffer

data = { 2848: b'abc', 12939: b'123' }
mb = MapBuffer(data)

with open("data.mb", "wb") as f:
    f.write(mb.tobytes())

with open("data.mb", "rb") as f:
    binary = f.read()

mb = MapBuffer(binary)
print(mb[2848]) # fast: almost zero parsing required

>>> b'abc'

# assume data are a set of gzipped utf8 encoded strings
mb = MapBuffer(binary, 
    compress="gzip",
    frombytesfn=lambda x: x.decode("utf8")
)
print(mb[2848])
>>> "abc" # bytes were automatically decoded

Installation

pip install mapbuffer

Motivation

MapBuffer is designed to allow you to store dictionaries mapping integers to binary buffers in a serialized format and then read that back in and use it without requiring an expensive parse of the entire dictionary. Instead, if you have a dictionary containing thousands of keys, but only need a few items from it you can extract them rapidly.

This serialization format was designed to solve a performance problem with our pipeline for merging skeleton fragments from a large dense image segmentation. The 3D image was carved up into a grid and each gridpoint generated potentially thousands of skeletons which were written into a single pickle file. Since an individual segmentation could cross many gridpoints, fusion across many files is required, but each file contains many irrelevant skeleton fragments for a given operation. In one measurement, pickle.loads was taking 68% of the processing time for an operation that was taking two weeks to run on hundreds of cores.

Therefore, this method was developed to skip parsing the dictionaries and rapidly extract skeleton fragments.

Design

The MapBuffer object is designed to translate dictionaries into a serialized byte buffer and extract objects directly from it by consulting an index. The index consists of a series of key-value pairs where the values are indices into the byte stream where each object's data stream starts.

This means that the format is best regarded as immutable once written. It can be easily converted into a standard dictionary at will. The main purpose is for reading just a few objects out of a larger stream of data.

Benchmark

The following benchmark was derived from running perf.py.

Format

The byte string format consists of a 16 byte header, an index, and a series of (possibily individually compressed) serialized objects.

HEADER|INDEX|DATA_REGION

Header

b'mapbufr' (7b)|FORMAT_VERSION (uint8)|COMPRESSION_TYPE (4b)|INDEX_SIZE (uint32)

Valid compression types: b'none', b'gzip', b'00br', b'zstd', b'lzma'

Example: b'mapbufr\x00gzip\x00\x00\x04\x00' meaning version 0 format, gzip compressed, 1024 keys.

Index

<uint64*>[ label, offset, label, offset, label, offset, ... ]

The index is an array of label and offset pairs (both uint64) that tell you where in the byte stream to start reading. The read length can be determined by referencing the next offset which are guaranteed to be in ascending order. The labels however, are written in Eyztinger order to enable cache-aware binary search.

The index can be consulted by conducting an Eytzinger binary search over the labels to find the correct offset.

Data Region

The data objects are serialized to bytes and compressed individually if the header indicates they should be. They are then concatenated in the same order the index specifies.

Versus Flexbuffers

The concept here was inspired by Flatbuffers.Flexbuffers, however the Python implementation (not the C++ implementation) there was a little slow as of this writing. We also add a few differences:

  1. Eytzinger ordering of labels to potentially achieve even higher read speeds
  2. Structure optimized for network range reads.
  3. Integer keys only.
  4. Compression is built in to the structure.
  5. Interface has a lot of syntatic sugar to simulate dictionaries.

Link: https://google.github.io/flatbuffers/flexbuffers.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapbuffer-0.3.0.tar.gz (167.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mapbuffer-0.3.0-cp39-cp39-manylinux2014_x86_64.whl (24.6 kB view details)

Uploaded CPython 3.9

mapbuffer-0.3.0-cp39-cp39-manylinux1_x86_64.whl (24.6 kB view details)

Uploaded CPython 3.9

mapbuffer-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl (13.3 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

mapbuffer-0.3.0-cp38-cp38-manylinux2014_x86_64.whl (24.8 kB view details)

Uploaded CPython 3.8

mapbuffer-0.3.0-cp38-cp38-manylinux2010_x86_64.whl (24.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

mapbuffer-0.3.0-cp38-cp38-manylinux1_x86_64.whl (24.8 kB view details)

Uploaded CPython 3.8

mapbuffer-0.3.0-cp38-cp38-macosx_10_9_x86_64.whl (13.3 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

mapbuffer-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl (25.8 kB view details)

Uploaded CPython 3.7m

mapbuffer-0.3.0-cp37-cp37m-manylinux2010_x86_64.whl (25.8 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

mapbuffer-0.3.0-cp37-cp37m-manylinux1_x86_64.whl (25.8 kB view details)

Uploaded CPython 3.7m

mapbuffer-0.3.0-cp37-cp37m-macosx_10_9_x86_64.whl (13.3 kB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

mapbuffer-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl (24.9 kB view details)

Uploaded CPython 3.6m

mapbuffer-0.3.0-cp36-cp36m-manylinux2010_x86_64.whl (24.9 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

mapbuffer-0.3.0-cp36-cp36m-manylinux1_x86_64.whl (24.9 kB view details)

Uploaded CPython 3.6m

mapbuffer-0.3.0-cp36-cp36m-macosx_10_9_x86_64.whl (13.3 kB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file mapbuffer-0.3.0.tar.gz.

File metadata

  • Download URL: mapbuffer-0.3.0.tar.gz
  • Upload date:
  • Size: 167.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d4a63fabae2a77d5d18d92acfe2c8a309e3ce861ad420f42230c48e5cee61140
MD5 891b33e77801becdfe25e994bb1c6f38
BLAKE2b-256 ff7935caff51e1c702ab39f37615f03cbb1b25069bd8f7c99b6b3839213c0b8e

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 95686f3effd773be23338cee3290b60a0c6425e23149b54f1e664c8ca732660a
MD5 b989b9b6ccb54cbbc20b799152c1285d
BLAKE2b-256 49c98403fc7646b745f9e22429603f0a825ab047a3e3054a52d6f10c7bb956e4

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 05fa5139642b5f405dd47fdb0df9c4c020dbd586aab041839088169e45dac021
MD5 e303bedbf3c60bdc2bee64cc1fe2b880
BLAKE2b-256 a40f105d1512f20665b05273f23ff2c249613f7bd4050ed3361d438cd58678f6

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 36c2b731fd762593fa0d2e0a76881ffb632baf9ce576afd4dc5fe9e67d2429d6
MD5 616968227cafdfa2a63312e3cb0f4f66
BLAKE2b-256 1ad5b97201f9ee5436945fe9751c171759c5b7235edeffca4d53551064ef5d5d

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6211e8b1b3a41092d0d7145dd5b58eef959cda58b4e7e4c313900d83a15febb8
MD5 441df5b58e5fabc99efd1a33cb21d41e
BLAKE2b-256 a6ab1d0ca7c8f25b25ee6282ae05563a1ac662bf14e948bdb9dff3aeb9b5609a

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 cb48bce35a266c1e6b90bdb075de9114d335dd96e1ea0c7a11411614815a98fc
MD5 87809ceb7d6403f36df703dd05491f4b
BLAKE2b-256 f743604fd6aedfdf513ebc5d9fb52992264bfbed861704dbd4303f02e320c1b8

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 581bc2bdf37f58968a4fc4b7c414ab81b55b0a1ff166f7dde785b63ee80c4e8f
MD5 59ca7f39890953fb5a7da92fce0d99e7
BLAKE2b-256 0d89b28f062698fbc3eebaed45ae14de639d224876e7d61448c8055ea296648a

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a4a793c634b9e2ac0afb7deea04da1500acadfe378655d40122631028f07226f
MD5 0594bb3a10390fa64b52e7cad4391faf
BLAKE2b-256 94c0cf56593d30949fb0d45cce1b84bbc6580cdd0f778260a53c9982ad120fb4

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7315340bc23b09bb048f5179238a9ff8b23427800e46a02a20e38ed3cbaac5a5
MD5 0395d7329cd00d04ebb339c1005fcff8
BLAKE2b-256 bb372d29d47b46cd9081db58f29f7e423b07339c737ca785c2fee3ce477a812d

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 d35a3fffabaeaf751403fce38312a700d51051ec96ab7547bd7dc83245c23a31
MD5 294c6aa468191d52b7862117daa9cc36
BLAKE2b-256 a88a341b52bee193622776c5aeba6d895eb261a29c73ca799961aa53c43496a4

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4a6fb23acb48e47245ed1d0e214436d1ed71ab32667e54b956a78fc33b9eaba2
MD5 1673a72b5910b7a7c71d664b028183c7
BLAKE2b-256 726cfb2c730cc94ea6a1cde29c9f93568a3842f78d0d2d9b5719cf9551a7701a

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e0b53319b0ec7acf324bfd95ad0a7e358f202b5c5034e106cef1f86e37c8b5a5
MD5 fcf4d650f1f96d838542aea84510efb4
BLAKE2b-256 0db5bf0f578dc5c5340c6acb992ba52fd9e5ca8ae33c253a0e078b5cb1047708

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 526dc0500b8ced4f11673caa0fd49aa24931fce5a19f30e6dcdb9607d984c5fc
MD5 281eac8f093052da62decff21a29e4db
BLAKE2b-256 37acc9682e920719bb47b8b9690c214cefece73c5ea641a6a0b6a6efd7c6d624

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 9e0b9613ba41dcf290b8af7e2d19519b4c25bf028c3a18b1543d19fb905717f3
MD5 5532be91cc97bfca1dfce5845cca3fbf
BLAKE2b-256 94cf11f578bc410aafcb5a3498af43e6ef19512f05229054d5a30830f2d32e14

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4837fd262ceba8166468fac30f43968e449a12c1c69a7fa83d219487ff2e71c4
MD5 304727ae6a194649cf013eda89104215
BLAKE2b-256 5606c9172737167b7e8c324f7241b3d0366203704db9e70e7a62346e102627b5

See more details on using hashes here.

File details

Details for the file mapbuffer-0.3.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mapbuffer-0.3.0-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for mapbuffer-0.3.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ce384ee043558ca496559d4faf2a3fd4852e0d7855427cce905a97b258a5de68
MD5 14cfab6b96a28cee2ef82d1fbce7ff09
BLAKE2b-256 351c77c3905d25dcd693329cf7ac013b9424f5b9ea4d577742a14efbf02cda6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page