Skip to main content

Hopefully safe and deterministic serializer to binary format, including Pandas data

Project description

test codecov github Python version license: GPL v3 API documentation

safeserializer - Serialization of nested objects to binary format

An alternative to pickle, but may use pickle if safety is not needed. Principle: Start from the simplest and safest possible and try to be fast. Serialization is attempted in the following order:

  • try orjson
    • dict, str, int, etc
  • try bson
    • standard types accepted by mongodb
  • convert bigints to str
  • try to serialize as raw numpy bytes
    • ndarray, pandas homogeneous Series/DataFrame
  • try parquet
    • pandas ill-typed Series/DataFrame
  • resort to pickle if allowed (unsafe_fallback=True)
  • resort to dill if allowed (ensure_determinism=False).

Top level tuples are preserved, insted of converted to lists (e.g., by bson).

Python installation

from package

# Set up a virtualenv. 
python3 -m venv venv
source venv/bin/activate

# Install from PyPI
pip install safeserializer[full]

from source

git clone https://github.com/safeserializer/safeserializer
cd safeserializer
poetry install --extras full

Examples

Packing and safe unpacking

from pandas import DataFrame as DF

from safeserializer import unpack, pack

df = DF({"a": ["5", "6", "7"], "b": [1, 2, 3]}, index=["x", "y", "z"])
complex_data = {"a": b"Some binary content", ("mixed-types tuple as a key", 4): 123, "df": df}
print(complex_data)
"""
{'a': b'Some binary content', ('mixed-types tuple as a key', 4): 123, 'df':    a  b
x  5  1
y  6  2
z  7  3}
"""
dump = pack(complex_data, ensure_determinism=True, unsafe_fallback=False)
print(dump)
"""
b'00lz4__\x04"M\x18h@h\x0b\x00\x00\x00\x00\x00\x00\x94\x95\x06\x00\x00\xf1*00dicB_a\x0b\x00\x00\x0530306a736f6e5f226122\x00\x13\x00\x00\x00\x00Some binary content.\x00\xd27475706c5f480\x01\x00b45f004\x0c\x00\x8000530002\x05\x00\x01\x02\x00\rb\x00\xf5\x07d697865642d74797065732X\x00`652061\x12\x00\xf4\x0461206b6579220531000r\x00\x0bV\x00\x113}\x00 \x00\n\xb8\x00\xa100json_123\xaf\x00\t\xdd\x00p46622\x00b(\x00\xf0\x1100prqd_PAR1\x15\x04\x15\x1e\x15"L\x15\x06\x15\x00\x12\x00\x00\x0f8\x01\x00\x00\x005\x05\x00\x106\x05\x00\xf667\x15\x00\x15\x14\x15\x18,\x15\x06\x15\x10\x15\x06\x15\x06\x1c6\x00(\x017\x18\x015\x00\x00\x00\n$\x02\x00\x00\x00\x06\x01\x02\x03$\x00&\x94\x01\x1c\x15\x0c\x195\x10\x00\x06\x19\x18\x01a\x15\x02\x16\x06\x16\x84\x01\x16\x8c\x01&F&\x085\x00\xe0\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15?\x00d\x15\x04\x150\x15.\x7f\x00p\x18\x04\x01\x00\t\x01<\x19\x00\x00\x02\x00\x10\x03\x05\x00 \x00\x00.\x00\t\x85\x00$\x18\x08\x1a\x00 \x18\x08\xa6\x00\x00\x02\x00?\x16\x00(\x16\x00\x00\x0c\xa7\x00T\xe2\x03\x1c\x15\x04\xa7\x00\x11b\xa7\x00\xbf\xda\x01\x16\xdc\x01&\xd0\x02&\x86\x02Y\x00\x19\x0f\xcb\x00\x02\rJ\x01\x10x\x9f\x00\x10y\x05\x00\x1fzJ\x01\x01Lz\x18\x01x\xa3\x00&\xa8\x06J\x01\xf1\x03\x11__index_level_0__\xb3\x00\x02Z\x01Q\xda\x05&\x9c\x05\x91\x01\x01G\x00\x0f\x91\x00\x01\xf0\x0b\x19L5\x00\x18\x06schema\x15\x06\x00\x15\x0c%\x02\x18\x01a%\x00L\x1cV\x01\x10\x04\x0e\x00\x12b\x16\x00\x0ej\x00\x03&\x00o\x16\x06\x19\x1c\x19<\xe0\x01&\x0fr\x01J\x0f,\x018\xb0\x16\xe2\x03\x16\x06&\x08\x16\xf4\x03\x14\xdc\x01\xd2\x18\x06pandas\x18\xd5\x04{"\x83\x01\xcdcolumns": ["\x97\x01R"], ""\x00\x02\xb2\x01\x11e)\x00\xfa\x07{"name": null, "field_\x14\x00\x02i\x00@_typ)\x00\xf5\x02"unicode", "numpy\x19\x00`object\x18\x00\xf6\x12metadata": {"encoding": "UTF-8"}}\x8d\x00\n\x86\x00 "a?\x00\t\x85\x00\x02\x13\x00\x0f\x84\x00*\x00\xdc\x005}, \xec\x00."bf\x00\x01\x13\x00\x0bf\x00Pint64+\x00\n\xe8\x00\x05\x17\x00\x07\xe7\x00\x0cc\x00\x00\x10\x00\x0cO\x01\x0f\x95\x01\x00\x0et\x00\x0f^\x01\x1b\x00g\x00\x02M\x01areatorq\x01plibraryp\x01ppyarrow\xc4\x00pversion\x16\x00\x8611.0.0"}~\x00\x08\x1d\x00\xf2\x00.5.3"}\x00\x18\x0cARROW:\x9c\x03@\x18\x98\t/\x01\x00\x822gDAAAQA\x01\x00\xf1\x00KAA4ABgAFAAgACg\x15\x00)BB \x00\x10w\x15\x00\x15E \x002IwC\x10\x00\x04F\x00\x01 \x00\x10I\x08\x00\x11B\x08\x00\x01E\x00\x10I \x00\x10E\x05\x00\xf4GAYAAABwYW5kYXMAAFUCAAB7ImluZGV4X2NvbHVtbnMiOiBbIl9faW5kZXhfbGV2ZWxfMF9fIl0sICJjb2x1bW5$\x00\xf0\x01lcyI6IFt7Im5hbWUD\x00\xf0\x11udWxsLCAiZmllbGRfbmFtZSI6IG51bGwH\x00\x03\x90\x00aNfdHlw\x1c\x00\xf0\x0eCJ1bmljb2RlIiwgIm51bXB5X3R5cGX\x00\xa2Aib2JqZWN0 \x00\xf0\x0c1ldGFkYXRhIjogeyJlbmNvZGluZ\x94\x00\xe0CJVVEYtOCJ9fV0t\x00\x04\xbc\x00\x10z0\x00EW3si\x98\x002CJhT\x00\x84ZpZWxkX2\xcc\x00PAiYSI<\x00\x0f\xb0\x00>\xa9bnVsbH0sIH\x88\x00\x1fi\x88\x00\x06\x1fi\x88\x00\x06qpbnQ2NC \x00\xd0udW1weV90eXBl\xe8\x00\x00\xe8\x01?dDY4\x01\x02\x0f\x84\x00\x02\x06\xa4\x01\xc2maWVsZF9uYW1L\x00\x0f\x1c\x02\x05\x00\xb0\x01\x97nBhbmRhc1|\x00PnVuaW\x94\x01 Ui\x10\x02xbnVtcHl\xf4\x01\x82vYmplY3Q \x00\xa5WV0YWRhdGEH\x02\x04\xbc\x01\x80cmVhdG9y\xd4\x00\xc0eyJsaWJyYXJ5\x10\x00\xb1InB5YXJyb3cH\x00\xf9\x0bdmVyc2lvbiI6ICIxMS4wLjAifS\xa8\x00\x912ZXJzaW9uD\x00\xa0jEuNS4zIn06\x03\x00\xa4\x03!Ah\n\x00\x01`\x03\x01K\x03PmP///\x0f\x00!QU\xc0\x03\x10J \x00\x05\xcb\x030AAE\x16\x00\x1fF$\x01\x03\x00 \x00\x01@\x00`8z///8\xc3\x03\x11CU\x00\x10BQ\x00\x11A\x0b\x00\x02\x02\x00\x00\x0b\x00 Bi\x0c\x00@CAAM\xd0\x03"Bw\xd0\x03\x01\x02\x00\x11U\x06\x00\xc2QABQACAAGAAc\xad\x00\x12B:\x00\x01\x02\x00\x03\xa0\x00\x12G\r\x00\x01\x06\x00\x01\x02\x00\x00\xa0\x00\x10G`\x00\x00p\x001QAB\x15\x00\xf1\x02==\x00\x18 parquet-cpp-\xf1\x04\x13 \xd1\x04\x12 \xeb\x04R\x19<\x1c\x00\x00\x03\x00\xa0\x00t\x08\x00\x00PAR1\x00\x00\x00\x00\x00'
"""
obj = unpack(dump)
print(obj)
"""
{'a': b'Some binary content', ('mixed-types tuple as a key', 4): 123, 'df':    a  b
x  5  1
y  6  2
z  7  3}
"""

Packing and unsafe unpacking

from pandas import DataFrame as DF

from safeserializer import unpack, pack

# Packing a function.
df = DF({"a": [print, 1, 2], "b": [1, 2, 3]}, index=["x", "y", "z"])
print(df)
"""
                           a  b
x  <built-in function print>  1
y                          1  2
z                          2  3
"""
dump = pack(df, ensure_determinism=True, unsafe_fallback=True)
print(dump)
"""
b'00lz4__\x04"M\x18h@\x07\x03\x00\x00\x00\x00\x00\x00Vg\x02\x00\x00\xd105pckl_\x80\x05\x95\xf5\x02\x00\x01\x00\xf1\x0c\x8c\x11pandas.core.frame\x94\x8c\tDataF\x0c\x00\xf8\x02\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1e/\x00\xf2\x0cinternals.managers\x94\x8c\x0cBlockM\x10\x00S\x94\x93\x94\x8c\x162\x00V_libs3\x00\xe0\x94\x8c\x0f_unpickle_b4\x00\x00-\x00b\x15numpy\x8d\x00\xf0\nmultiarray\x94\x8c\x0c_reconstruct)\x00\x11\x05)\x00R\x94\x8c\x07nd#\x00\xf0\x10\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x01K\x03\x86\x94h\x0f\x8c\x05dtyp\xca\x00r\x8c\x02O8\x94\x89\x88 \x00\xd1\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xff\x05\x00\xf0\x0bK?t\x94b\x89]\x94(\x8c\x08builtins\x94\x8c\x05prinr\x00\x89K\x01K\x02et\x94b\x1d\x00@slicZ\x00 K\x00p\x00\x00Y\x00 K\x02\x06\x00Ah\x0b\x8c\x12\xa1\x00\x02\xca\x00\xf1\x04numeric\x94\x8c\x0b_frombuff\x1c\x011(\x96\x18\x85\x010\x00\x00\x01\x05\x003\x00\x00\x00\x96\x01!\x00\x03\x0e\x00\x89\x00\x00\x94h\x18\x8c\x02i\xb6\x00\x1b<\xb6\x00B\x00t\x94b\xec\x00\xa0\x8c\x01C\x94t\x94R\x94h%\xad\x00\x08\x90\x00 \x86\x94\xd7\x00\x13\x18\x8a\x01\x01\xeb\x01\xf0\x06indexes.base\x94\x8c\n_new_I\x14\x00t\x94\x93\x94h=\x8c\x05\x0c\x00\x01\xfc\x01\x90data\x94h\x0eh\x11d\x01 h\x13\xe3\x00\x00b\x01Q\x02\x85\x94h\x1b2\x01q\x01a\x94\x8c\x01b\x94!\x01 \x04nE\x020Nu\x86\x8b\x00\x8e?hA}\x94(hC=\x00\x16\x03=\x00\x91x\x94\x8c\x01y\x94\x8c\x01zA\x00"hL<\x00\x10eA\x00\x90\x8c\x04_typ\x94\x8c\t\x83\x00\x04\x9f\x02P_meta\x11\x00\xf1\x05\x94]\x94\x8c\x05attrs\x94}\x94\x8c\x06_flag\x0b\x00\xf0\x0f\x17allows_duplicate_labels\x94\x88sub.\x00\x00\x00\x00'
"""
obj = unpack(dump)
print(obj)
"""
                           a  b
x  <built-in function print>  1
y                          1  2
z                          2  3
"""

Grants

This work was partially supported by Fapesp under supervision of Prof. André C. P. L. F. de Carvalho at CEPID-CeMEAI (Grants 2013/07375-0 – 2019/01735-0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safeserializer-0.230202.1.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

safeserializer-0.230202.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file safeserializer-0.230202.1.tar.gz.

File metadata

  • Download URL: safeserializer-0.230202.1.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-58-generic

File hashes

Hashes for safeserializer-0.230202.1.tar.gz
Algorithm Hash digest
SHA256 9ca7fe466bd7f7b10b25c3b3d797f1e58b43c04fdd4ef4eab82c83b797c1a05e
MD5 5a36a52fc7b65383594118c92dd4f89b
BLAKE2b-256 eeb8f31fea9713cbf16d1ea621e9804830638fcf97a6d729077bda6949757764

See more details on using hashes here.

File details

Details for the file safeserializer-0.230202.1-py3-none-any.whl.

File metadata

File hashes

Hashes for safeserializer-0.230202.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c6cb41ae069d9510ccb115b680574a8f3bd5b6d31a9c03edca7c0cce1b2b08e
MD5 38b4e622baf571c383a4f5f6ac0ebe3f
BLAKE2b-256 134e80e9a56838176020188e6c64e25d76bdb2ee84069e0743a35922b113209e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page