Skip to main content

Hopefully safe and deterministic serializer to binary format, including Pandas data

Project description

test codecov github Python version license: GPL v3 API documentation

safeserializer - Serialization of nested objects to binary format

An alternative to pickle, but may use pickle if safety is not needed. Principle: Start from the simplest and safest possible and try to be fast. Serialization is attempted in the following order:

  • try orjson
    • dict, str, int, etc
  • try bson
    • standard types accepted by mongodb
  • convert bigints to str
  • try to serialize as raw numpy bytes
    • ndarray, pandas homogeneous Series/DataFrame
  • try parquet
    • pandas ill-typed Series/DataFrame
  • resort to pickle if allowed (unsafe_fallback=True)
  • resort to dill if allowed (ensure_determinism=False).

Top level tuples are preserved, insted of converted to lists (e.g., by bson).

Python installation

from package

# Set up a virtualenv. 
python3 -m venv venv
source venv/bin/activate

# Install from PyPI
pip install safeserializer[full]

from source

git clone https://github.com/safeserializer/safeserializer
cd safeserializer
poetry install --extras full

Examples

Packing and safe unpacking

from pandas import DataFrame as DF

from safeserializer import unpack, pack

df = DF({"a": ["5", "6", "7"], "b": [1, 2, 3]}, index=["x", "y", "z"])
complex_data = {"a": b"Some binary content", ("mixed-types tuple as a key", 4): 123, "df": df}
print(complex_data)
"""
{'a': b'Some binary content', ('mixed-types tuple as a key', 4): 123, 'df':    a  b
x  5  1
y  6  2
z  7  3}
"""
dump = pack(complex_data, ensure_determinism=True, unsafe_fallback=False)
print(dump)
"""
b'00lz4__\x04"M\x18h@h\x0b\x00\x00\x00\x00\x00\x00\x94\x95\x06\x00\x00\xf1*00dicB_a\x0b\x00\x00\x0530306a736f6e5f226122\x00\x13\x00\x00\x00\x00Some binary content.\x00\xd27475706c5f480\x01\x00b45f004\x0c\x00\x8000530002\x05\x00\x01\x02\x00\rb\x00\xf5\x07d697865642d74797065732X\x00`652061\x12\x00\xf4\x0461206b6579220531000r\x00\x0bV\x00\x113}\x00 \x00\n\xb8\x00\xa100json_123\xaf\x00\t\xdd\x00p46622\x00b(\x00\xf0\x1100prqd_PAR1\x15\x04\x15\x1e\x15"L\x15\x06\x15\x00\x12\x00\x00\x0f8\x01\x00\x00\x005\x05\x00\x106\x05\x00\xf667\x15\x00\x15\x14\x15\x18,\x15\x06\x15\x10\x15\x06\x15\x06\x1c6\x00(\x017\x18\x015\x00\x00\x00\n$\x02\x00\x00\x00\x06\x01\x02\x03$\x00&\x94\x01\x1c\x15\x0c\x195\x10\x00\x06\x19\x18\x01a\x15\x02\x16\x06\x16\x84\x01\x16\x8c\x01&F&\x085\x00\xe0\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15?\x00d\x15\x04\x150\x15.\x7f\x00p\x18\x04\x01\x00\t\x01<\x19\x00\x00\x02\x00\x10\x03\x05\x00 \x00\x00.\x00\t\x85\x00$\x18\x08\x1a\x00 \x18\x08\xa6\x00\x00\x02\x00?\x16\x00(\x16\x00\x00\x0c\xa7\x00T\xe2\x03\x1c\x15\x04\xa7\x00\x11b\xa7\x00\xbf\xda\x01\x16\xdc\x01&\xd0\x02&\x86\x02Y\x00\x19\x0f\xcb\x00\x02\rJ\x01\x10x\x9f\x00\x10y\x05\x00\x1fzJ\x01\x01Lz\x18\x01x\xa3\x00&\xa8\x06J\x01\xf1\x03\x11__index_level_0__\xb3\x00\x02Z\x01Q\xda\x05&\x9c\x05\x91\x01\x01G\x00\x0f\x91\x00\x01\xf0\x0b\x19L5\x00\x18\x06schema\x15\x06\x00\x15\x0c%\x02\x18\x01a%\x00L\x1cV\x01\x10\x04\x0e\x00\x12b\x16\x00\x0ej\x00\x03&\x00o\x16\x06\x19\x1c\x19<\xe0\x01&\x0fr\x01J\x0f,\x018\xb0\x16\xe2\x03\x16\x06&\x08\x16\xf4\x03\x14\xdc\x01\xd2\x18\x06pandas\x18\xd5\x04{"\x83\x01\xcdcolumns": ["\x97\x01R"], ""\x00\x02\xb2\x01\x11e)\x00\xfa\x07{"name": null, "field_\x14\x00\x02i\x00@_typ)\x00\xf5\x02"unicode", "numpy\x19\x00`object\x18\x00\xf6\x12metadata": {"encoding": "UTF-8"}}\x8d\x00\n\x86\x00 "a?\x00\t\x85\x00\x02\x13\x00\x0f\x84\x00*\x00\xdc\x005}, \xec\x00."bf\x00\x01\x13\x00\x0bf\x00Pint64+\x00\n\xe8\x00\x05\x17\x00\x07\xe7\x00\x0cc\x00\x00\x10\x00\x0cO\x01\x0f\x95\x01\x00\x0et\x00\x0f^\x01\x1b\x00g\x00\x02M\x01areatorq\x01plibraryp\x01ppyarrow\xc4\x00pversion\x16\x00\x8611.0.0"}~\x00\x08\x1d\x00\xf2\x00.5.3"}\x00\x18\x0cARROW:\x9c\x03@\x18\x98\t/\x01\x00\x822gDAAAQA\x01\x00\xf1\x00KAA4ABgAFAAgACg\x15\x00)BB \x00\x10w\x15\x00\x15E \x002IwC\x10\x00\x04F\x00\x01 \x00\x10I\x08\x00\x11B\x08\x00\x01E\x00\x10I \x00\x10E\x05\x00\xf4GAYAAABwYW5kYXMAAFUCAAB7ImluZGV4X2NvbHVtbnMiOiBbIl9faW5kZXhfbGV2ZWxfMF9fIl0sICJjb2x1bW5$\x00\xf0\x01lcyI6IFt7Im5hbWUD\x00\xf0\x11udWxsLCAiZmllbGRfbmFtZSI6IG51bGwH\x00\x03\x90\x00aNfdHlw\x1c\x00\xf0\x0eCJ1bmljb2RlIiwgIm51bXB5X3R5cGX\x00\xa2Aib2JqZWN0 \x00\xf0\x0c1ldGFkYXRhIjogeyJlbmNvZGluZ\x94\x00\xe0CJVVEYtOCJ9fV0t\x00\x04\xbc\x00\x10z0\x00EW3si\x98\x002CJhT\x00\x84ZpZWxkX2\xcc\x00PAiYSI<\x00\x0f\xb0\x00>\xa9bnVsbH0sIH\x88\x00\x1fi\x88\x00\x06\x1fi\x88\x00\x06qpbnQ2NC \x00\xd0udW1weV90eXBl\xe8\x00\x00\xe8\x01?dDY4\x01\x02\x0f\x84\x00\x02\x06\xa4\x01\xc2maWVsZF9uYW1L\x00\x0f\x1c\x02\x05\x00\xb0\x01\x97nBhbmRhc1|\x00PnVuaW\x94\x01 Ui\x10\x02xbnVtcHl\xf4\x01\x82vYmplY3Q \x00\xa5WV0YWRhdGEH\x02\x04\xbc\x01\x80cmVhdG9y\xd4\x00\xc0eyJsaWJyYXJ5\x10\x00\xb1InB5YXJyb3cH\x00\xf9\x0bdmVyc2lvbiI6ICIxMS4wLjAifS\xa8\x00\x912ZXJzaW9uD\x00\xa0jEuNS4zIn06\x03\x00\xa4\x03!Ah\n\x00\x01`\x03\x01K\x03PmP///\x0f\x00!QU\xc0\x03\x10J \x00\x05\xcb\x030AAE\x16\x00\x1fF$\x01\x03\x00 \x00\x01@\x00`8z///8\xc3\x03\x11CU\x00\x10BQ\x00\x11A\x0b\x00\x02\x02\x00\x00\x0b\x00 Bi\x0c\x00@CAAM\xd0\x03"Bw\xd0\x03\x01\x02\x00\x11U\x06\x00\xc2QABQACAAGAAc\xad\x00\x12B:\x00\x01\x02\x00\x03\xa0\x00\x12G\r\x00\x01\x06\x00\x01\x02\x00\x00\xa0\x00\x10G`\x00\x00p\x001QAB\x15\x00\xf1\x02==\x00\x18 parquet-cpp-\xf1\x04\x13 \xd1\x04\x12 \xeb\x04R\x19<\x1c\x00\x00\x03\x00\xa0\x00t\x08\x00\x00PAR1\x00\x00\x00\x00\x00'
"""
obj = unpack(dump)
print(obj)
"""
{'a': b'Some binary content', ('mixed-types tuple as a key', 4): 123, 'df':    a  b
x  5  1
y  6  2
z  7  3}
"""

Packing and unsafe unpacking

from pandas import DataFrame as DF

from safeserializer import unpack, pack

# Packing a function.
df = DF({"a": [print, 1, 2], "b": [1, 2, 3]}, index=["x", "y", "z"])
print(df)
"""
                           a  b
x  <built-in function print>  1
y                          1  2
z                          2  3
"""
dump = pack(df, ensure_determinism=True, unsafe_fallback=True)
print(dump)
"""
b'00lz4__\x04"M\x18h@\x07\x03\x00\x00\x00\x00\x00\x00Vg\x02\x00\x00\xd105pckl_\x80\x05\x95\xf5\x02\x00\x01\x00\xf1\x0c\x8c\x11pandas.core.frame\x94\x8c\tDataF\x0c\x00\xf8\x02\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1e/\x00\xf2\x0cinternals.managers\x94\x8c\x0cBlockM\x10\x00S\x94\x93\x94\x8c\x162\x00V_libs3\x00\xe0\x94\x8c\x0f_unpickle_b4\x00\x00-\x00b\x15numpy\x8d\x00\xf0\nmultiarray\x94\x8c\x0c_reconstruct)\x00\x11\x05)\x00R\x94\x8c\x07nd#\x00\xf0\x10\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x01K\x03\x86\x94h\x0f\x8c\x05dtyp\xca\x00r\x8c\x02O8\x94\x89\x88 \x00\xd1\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xff\x05\x00\xf0\x0bK?t\x94b\x89]\x94(\x8c\x08builtins\x94\x8c\x05prinr\x00\x89K\x01K\x02et\x94b\x1d\x00@slicZ\x00 K\x00p\x00\x00Y\x00 K\x02\x06\x00Ah\x0b\x8c\x12\xa1\x00\x02\xca\x00\xf1\x04numeric\x94\x8c\x0b_frombuff\x1c\x011(\x96\x18\x85\x010\x00\x00\x01\x05\x003\x00\x00\x00\x96\x01!\x00\x03\x0e\x00\x89\x00\x00\x94h\x18\x8c\x02i\xb6\x00\x1b<\xb6\x00B\x00t\x94b\xec\x00\xa0\x8c\x01C\x94t\x94R\x94h%\xad\x00\x08\x90\x00 \x86\x94\xd7\x00\x13\x18\x8a\x01\x01\xeb\x01\xf0\x06indexes.base\x94\x8c\n_new_I\x14\x00t\x94\x93\x94h=\x8c\x05\x0c\x00\x01\xfc\x01\x90data\x94h\x0eh\x11d\x01 h\x13\xe3\x00\x00b\x01Q\x02\x85\x94h\x1b2\x01q\x01a\x94\x8c\x01b\x94!\x01 \x04nE\x020Nu\x86\x8b\x00\x8e?hA}\x94(hC=\x00\x16\x03=\x00\x91x\x94\x8c\x01y\x94\x8c\x01zA\x00"hL<\x00\x10eA\x00\x90\x8c\x04_typ\x94\x8c\t\x83\x00\x04\x9f\x02P_meta\x11\x00\xf1\x05\x94]\x94\x8c\x05attrs\x94}\x94\x8c\x06_flag\x0b\x00\xf0\x0f\x17allows_duplicate_labels\x94\x88sub.\x00\x00\x00\x00'
"""
obj = unpack(dump)
print(obj)
"""
                           a  b
x  <built-in function print>  1
y                          1  2
z                          2  3
"""

Grants

This work was partially supported by Fapesp under supervision of Prof. André C. P. L. F. de Carvalho at CEPID-CeMEAI (Grants 2013/07375-0 – 2019/01735-0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safeserializer-0.230202.1.tar.gz (28.2 kB view hashes)

Uploaded Source

Built Distribution

safeserializer-0.230202.1-py3-none-any.whl (25.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page