Hopefully safe and deterministic serializer to binary format, including Pandas data
Project description
safeserializer - Serialization of nested objects to binary format
An alternative to pickle, but may use pickle if safety is not needed. Principle: Start from the simplest and safest possible and try to be fast. Serialization is attempted in the following order:
- try orjson
dict
,str
,int
, etc
- try bson
- standard types accepted by mongodb
- convert bigints to str
- try to serialize as raw numpy bytes
- ndarray, pandas homogeneous Series/DataFrame
- try parquet
- pandas ill-typed Series/DataFrame
- resort to pickle if allowed (
unsafe_fallback=True
) - resort to dill if allowed (
ensure_determinism=False
).
Top level tuples are preserved, insted of converted to lists (e.g., by bson).
Python installation
from package
# Set up a virtualenv.
python3 -m venv venv
source venv/bin/activate
# Install from PyPI
pip install safeserializer[full]
from source
git clone https://github.com/safeserializer/safeserializer
cd safeserializer
poetry install --extras full
Examples
Packing and safe unpacking
from pandas import DataFrame as DF
from safeserializer import unpack, pack
df = DF({"a": ["5", "6", "7"], "b": [1, 2, 3]}, index=["x", "y", "z"])
complex_data = {"a": b"Some binary content", ("mixed-types tuple as a key", 4): 123, "df": df}
print(complex_data)
"""
{'a': b'Some binary content', ('mixed-types tuple as a key', 4): 123, 'df': a b
x 5 1
y 6 2
z 7 3}
"""
dump = pack(complex_data, ensure_determinism=True, unsafe_fallback=False)
print(dump)
"""
b'00lz4__\x04"M\x18h@h\x0b\x00\x00\x00\x00\x00\x00\x94\x95\x06\x00\x00\xf1*00dicB_a\x0b\x00\x00\x0530306a736f6e5f226122\x00\x13\x00\x00\x00\x00Some binary content.\x00\xd27475706c5f480\x01\x00b45f004\x0c\x00\x8000530002\x05\x00\x01\x02\x00\rb\x00\xf5\x07d697865642d74797065732X\x00`652061\x12\x00\xf4\x0461206b6579220531000r\x00\x0bV\x00\x113}\x00 \x00\n\xb8\x00\xa100json_123\xaf\x00\t\xdd\x00p46622\x00b(\x00\xf0\x1100prqd_PAR1\x15\x04\x15\x1e\x15"L\x15\x06\x15\x00\x12\x00\x00\x0f8\x01\x00\x00\x005\x05\x00\x106\x05\x00\xf667\x15\x00\x15\x14\x15\x18,\x15\x06\x15\x10\x15\x06\x15\x06\x1c6\x00(\x017\x18\x015\x00\x00\x00\n$\x02\x00\x00\x00\x06\x01\x02\x03$\x00&\x94\x01\x1c\x15\x0c\x195\x10\x00\x06\x19\x18\x01a\x15\x02\x16\x06\x16\x84\x01\x16\x8c\x01&F&\x085\x00\xe0\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15?\x00d\x15\x04\x150\x15.\x7f\x00p\x18\x04\x01\x00\t\x01<\x19\x00\x00\x02\x00\x10\x03\x05\x00 \x00\x00.\x00\t\x85\x00$\x18\x08\x1a\x00 \x18\x08\xa6\x00\x00\x02\x00?\x16\x00(\x16\x00\x00\x0c\xa7\x00T\xe2\x03\x1c\x15\x04\xa7\x00\x11b\xa7\x00\xbf\xda\x01\x16\xdc\x01&\xd0\x02&\x86\x02Y\x00\x19\x0f\xcb\x00\x02\rJ\x01\x10x\x9f\x00\x10y\x05\x00\x1fzJ\x01\x01Lz\x18\x01x\xa3\x00&\xa8\x06J\x01\xf1\x03\x11__index_level_0__\xb3\x00\x02Z\x01Q\xda\x05&\x9c\x05\x91\x01\x01G\x00\x0f\x91\x00\x01\xf0\x0b\x19L5\x00\x18\x06schema\x15\x06\x00\x15\x0c%\x02\x18\x01a%\x00L\x1cV\x01\x10\x04\x0e\x00\x12b\x16\x00\x0ej\x00\x03&\x00o\x16\x06\x19\x1c\x19<\xe0\x01&\x0fr\x01J\x0f,\x018\xb0\x16\xe2\x03\x16\x06&\x08\x16\xf4\x03\x14\xdc\x01\xd2\x18\x06pandas\x18\xd5\x04{"\x83\x01\xcdcolumns": ["\x97\x01R"], ""\x00\x02\xb2\x01\x11e)\x00\xfa\x07{"name": null, "field_\x14\x00\x02i\x00@_typ)\x00\xf5\x02"unicode", "numpy\x19\x00`object\x18\x00\xf6\x12metadata": {"encoding": "UTF-8"}}\x8d\x00\n\x86\x00 "a?\x00\t\x85\x00\x02\x13\x00\x0f\x84\x00*\x00\xdc\x005}, \xec\x00."bf\x00\x01\x13\x00\x0bf\x00Pint64+\x00\n\xe8\x00\x05\x17\x00\x07\xe7\x00\x0cc\x00\x00\x10\x00\x0cO\x01\x0f\x95\x01\x00\x0et\x00\x0f^\x01\x1b\x00g\x00\x02M\x01areatorq\x01plibraryp\x01ppyarrow\xc4\x00pversion\x16\x00\x8611.0.0"}~\x00\x08\x1d\x00\xf2\x00.5.3"}\x00\x18\x0cARROW:\x9c\x03@\x18\x98\t/\x01\x00\x822gDAAAQA\x01\x00\xf1\x00KAA4ABgAFAAgACg\x15\x00)BB \x00\x10w\x15\x00\x15E \x002IwC\x10\x00\x04F\x00\x01 \x00\x10I\x08\x00\x11B\x08\x00\x01E\x00\x10I \x00\x10E\x05\x00\xf4GAYAAABwYW5kYXMAAFUCAAB7ImluZGV4X2NvbHVtbnMiOiBbIl9faW5kZXhfbGV2ZWxfMF9fIl0sICJjb2x1bW5$\x00\xf0\x01lcyI6IFt7Im5hbWUD\x00\xf0\x11udWxsLCAiZmllbGRfbmFtZSI6IG51bGwH\x00\x03\x90\x00aNfdHlw\x1c\x00\xf0\x0eCJ1bmljb2RlIiwgIm51bXB5X3R5cGX\x00\xa2Aib2JqZWN0 \x00\xf0\x0c1ldGFkYXRhIjogeyJlbmNvZGluZ\x94\x00\xe0CJVVEYtOCJ9fV0t\x00\x04\xbc\x00\x10z0\x00EW3si\x98\x002CJhT\x00\x84ZpZWxkX2\xcc\x00PAiYSI<\x00\x0f\xb0\x00>\xa9bnVsbH0sIH\x88\x00\x1fi\x88\x00\x06\x1fi\x88\x00\x06qpbnQ2NC \x00\xd0udW1weV90eXBl\xe8\x00\x00\xe8\x01?dDY4\x01\x02\x0f\x84\x00\x02\x06\xa4\x01\xc2maWVsZF9uYW1L\x00\x0f\x1c\x02\x05\x00\xb0\x01\x97nBhbmRhc1|\x00PnVuaW\x94\x01 Ui\x10\x02xbnVtcHl\xf4\x01\x82vYmplY3Q \x00\xa5WV0YWRhdGEH\x02\x04\xbc\x01\x80cmVhdG9y\xd4\x00\xc0eyJsaWJyYXJ5\x10\x00\xb1InB5YXJyb3cH\x00\xf9\x0bdmVyc2lvbiI6ICIxMS4wLjAifS\xa8\x00\x912ZXJzaW9uD\x00\xa0jEuNS4zIn06\x03\x00\xa4\x03!Ah\n\x00\x01`\x03\x01K\x03PmP///\x0f\x00!QU\xc0\x03\x10J \x00\x05\xcb\x030AAE\x16\x00\x1fF$\x01\x03\x00 \x00\x01@\x00`8z///8\xc3\x03\x11CU\x00\x10BQ\x00\x11A\x0b\x00\x02\x02\x00\x00\x0b\x00 Bi\x0c\x00@CAAM\xd0\x03"Bw\xd0\x03\x01\x02\x00\x11U\x06\x00\xc2QABQACAAGAAc\xad\x00\x12B:\x00\x01\x02\x00\x03\xa0\x00\x12G\r\x00\x01\x06\x00\x01\x02\x00\x00\xa0\x00\x10G`\x00\x00p\x001QAB\x15\x00\xf1\x02==\x00\x18 parquet-cpp-\xf1\x04\x13 \xd1\x04\x12 \xeb\x04R\x19<\x1c\x00\x00\x03\x00\xa0\x00t\x08\x00\x00PAR1\x00\x00\x00\x00\x00'
"""
obj = unpack(dump)
print(obj)
"""
{'a': b'Some binary content', ('mixed-types tuple as a key', 4): 123, 'df': a b
x 5 1
y 6 2
z 7 3}
"""
Packing and unsafe unpacking
from pandas import DataFrame as DF
from safeserializer import unpack, pack
# Packing a function.
df = DF({"a": [print, 1, 2], "b": [1, 2, 3]}, index=["x", "y", "z"])
print(df)
"""
a b
x <built-in function print> 1
y 1 2
z 2 3
"""
dump = pack(df, ensure_determinism=True, unsafe_fallback=True)
print(dump)
"""
b'00lz4__\x04"M\x18h@\x07\x03\x00\x00\x00\x00\x00\x00Vg\x02\x00\x00\xd105pckl_\x80\x05\x95\xf5\x02\x00\x01\x00\xf1\x0c\x8c\x11pandas.core.frame\x94\x8c\tDataF\x0c\x00\xf8\x02\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1e/\x00\xf2\x0cinternals.managers\x94\x8c\x0cBlockM\x10\x00S\x94\x93\x94\x8c\x162\x00V_libs3\x00\xe0\x94\x8c\x0f_unpickle_b4\x00\x00-\x00b\x15numpy\x8d\x00\xf0\nmultiarray\x94\x8c\x0c_reconstruct)\x00\x11\x05)\x00R\x94\x8c\x07nd#\x00\xf0\x10\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x01K\x03\x86\x94h\x0f\x8c\x05dtyp\xca\x00r\x8c\x02O8\x94\x89\x88 \x00\xd1\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xff\x05\x00\xf0\x0bK?t\x94b\x89]\x94(\x8c\x08builtins\x94\x8c\x05prinr\x00\x89K\x01K\x02et\x94b\x1d\x00@slicZ\x00 K\x00p\x00\x00Y\x00 K\x02\x06\x00Ah\x0b\x8c\x12\xa1\x00\x02\xca\x00\xf1\x04numeric\x94\x8c\x0b_frombuff\x1c\x011(\x96\x18\x85\x010\x00\x00\x01\x05\x003\x00\x00\x00\x96\x01!\x00\x03\x0e\x00\x89\x00\x00\x94h\x18\x8c\x02i\xb6\x00\x1b<\xb6\x00B\x00t\x94b\xec\x00\xa0\x8c\x01C\x94t\x94R\x94h%\xad\x00\x08\x90\x00 \x86\x94\xd7\x00\x13\x18\x8a\x01\x01\xeb\x01\xf0\x06indexes.base\x94\x8c\n_new_I\x14\x00t\x94\x93\x94h=\x8c\x05\x0c\x00\x01\xfc\x01\x90data\x94h\x0eh\x11d\x01 h\x13\xe3\x00\x00b\x01Q\x02\x85\x94h\x1b2\x01q\x01a\x94\x8c\x01b\x94!\x01 \x04nE\x020Nu\x86\x8b\x00\x8e?hA}\x94(hC=\x00\x16\x03=\x00\x91x\x94\x8c\x01y\x94\x8c\x01zA\x00"hL<\x00\x10eA\x00\x90\x8c\x04_typ\x94\x8c\t\x83\x00\x04\x9f\x02P_meta\x11\x00\xf1\x05\x94]\x94\x8c\x05attrs\x94}\x94\x8c\x06_flag\x0b\x00\xf0\x0f\x17allows_duplicate_labels\x94\x88sub.\x00\x00\x00\x00'
"""
obj = unpack(dump)
print(obj)
"""
a b
x <built-in function print> 1
y 1 2
z 2 3
"""
Grants
This work was partially supported by Fapesp under supervision of Prof. André C. P. L. F. de Carvalho at CEPID-CeMEAI (Grants 2013/07375-0 – 2019/01735-0).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file safeserializer-0.230202.1.tar.gz
.
File metadata
- Download URL: safeserializer-0.230202.1.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-58-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ca7fe466bd7f7b10b25c3b3d797f1e58b43c04fdd4ef4eab82c83b797c1a05e |
|
MD5 | 5a36a52fc7b65383594118c92dd4f89b |
|
BLAKE2b-256 | eeb8f31fea9713cbf16d1ea621e9804830638fcf97a6d729077bda6949757764 |
File details
Details for the file safeserializer-0.230202.1-py3-none-any.whl
.
File metadata
- Download URL: safeserializer-0.230202.1-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-58-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c6cb41ae069d9510ccb115b680574a8f3bd5b6d31a9c03edca7c0cce1b2b08e |
|
MD5 | 38b4e622baf571c383a4f5f6ac0ebe3f |
|
BLAKE2b-256 | 134e80e9a56838176020188e6c64e25d76bdb2ee84069e0743a35922b113209e |