Skip to main content

ESCODE binary serialization

Project description

Fast Serializing and Deserializing for Python.

escode is a very fast binary serialize/deserialize module for Python. It is written as a Python C-Extension and operates on most portable types. It is designed to be fast, generate small encoding sizes, and have a version that is indexable/sortable. This last part is the primary motivation for escode since most data retrieval happens via indexed range queries.

Performance

Below is the space and time performance of escode as compared to other major encodings. It was generally faster, and produced sizes comparable to dense formats like cbor or msgpack. The code used can be found in the benchmark directory.

Method Encode (μs) xFaster Decode (μs) xFaster Size xSmaller
escode 981 1.00 2411 1.00 185 1.00
pickle 1723 1.76 2697 1.12 233 1.26
json 5185 5.28 7685 3.19 338 1.82
cbor 1534 1.56 2456 1.02 180 0.97
ujson 2017 2.06 3687 1.53 330 1.78
msgpack 1329 1.36 3086 1.28 179 0.97

Installation

# Requires gcc and python-dev
# Requires Python2.7+ or Python3+
pip install escode

Usage

import escode

data = {"id": <id>, "name":"James Maddison", ...}
blob = escode.encode(data)
db.put(<id>, blob)
...
dbdata = escode.decode(db.get(<id>))
assert dbdata == data

Most data retrieval for data happens via range queries which operates on data attributes. escode.encode_index produces an encoding that matches the sort order of the input. i.e.

cmp(tup1, tup2) == cmp(encoded_tup1, encoded_tup2)

Index encoding is not decodable as it skips some of info (like lengths for strings/collections)- it is only meant to be used to store and compare against an index of the same type.

City = namedtuple('City', ('id', 'name', 'country', 'pop'))

citylist = [
    City('city:1', 'Delhi', 'India', 19000000),
    City('city:2', 'Mumbai', 'India', 18000000),
    City('city:3', 'San Francisco', 'USA', 3000000),
    City('city:4', 'Paris', 'France', 5000000),...]

INDEX_NAME = 'index:city:country_pop'
indextuples = []

for city in citylist:
    # store data
    db.put(city.id, city)

    # store index
    indextuple = (INDEX_NAME, city.country, city.pop)
    indextuples.add(indextuple)
    index = escode.encode_index(indextuple)
    db.put(index, city.id)


#retrieval: Indian cities with pop between 1M and 5M
rangestart = escode.encode_index((INDEX_NAME,'India',1000000))
rangeend = escode.encode_index((INDEX_NAME,'India',5000000))
cityids = db.getrange(rangestart, rangeend)
indiancities = db.multiget(cityids)


# also useful for sorting
tuple_encodings = [
    (indextup, escode.encode_index(indextup))
    for indextup, cityid in indextuples]

# sorting by tuples is the same as sorting by the encodings
assert (sorted(indextuples, key=lambda tup_enc: tup_enc[0]) ==
        sorted(indextuples, key=lambda tup_enc: tup_enc[1]))

Format

Encodings use a 1 byte headbyte which stores the data type and some info, followed by a variable length (upto 8 bytes) number. This number is used to either store integer types or lengths for collections.

Index encodings are tricky to implement since one cannot simply concat the index tuples in order to maintain sort ordering i.e. ('a','z') < ('aa', 'z') but 'az' > 'aaz' This is accomplished in escode by using '\x00\x00' as the boundary between tuple elements, and escaping \x00s in the tuple elements themselves. Since elements like 8 byte zeros are fairly common, consecutive \x00s inside elements are compressed as an optimization.

Format Table

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

escode-3.0.1.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

escode-3.0.1-cp37-cp37m-macosx_10_16_x86_64.whl (12.5 kB view details)

Uploaded CPython 3.7m macOS 10.16+ x86-64

File details

Details for the file escode-3.0.1.tar.gz.

File metadata

  • Download URL: escode-3.0.1.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.3

File hashes

Hashes for escode-3.0.1.tar.gz
Algorithm Hash digest
SHA256 3e72284852c02ee3b9bcda0e5e1555a56b779dae75adec897621af7d41772ffc
MD5 aac02f5105a6cc50e55940e1541f9965
BLAKE2b-256 bb3c9d01506c6ecd6f81591a02615828082b5093d1290ecb986f7804828da761

See more details on using hashes here.

File details

Details for the file escode-3.0.1-cp37-cp37m-macosx_10_16_x86_64.whl.

File metadata

File hashes

Hashes for escode-3.0.1-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 b3c7f4bf9a5f5ef970915eae2724ed6d81be28fe4ab2db9eccd411d546582795
MD5 ff6abb2e1e17dd14433ea777806a105a
BLAKE2b-256 f2c4e72eea8fbe299f2d74f8b87afc934a9da3ede96291fcabc8f6a224bac6de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page