Skip to main content

Like pickle. But different

Project description

PyPI - Version PyPI - Python Version


antipickle

when you want to use pickle, but you shouldn't

Why? Because pickle isn't the right way to persist or share data, and we all know that.

When it comes to practice, it takes time and effort to substitute pickle.
'Hmm, I can use json here' — I thought on many occasions, and usually was wrong.

Something small but annoying was in the way:
datetime that can't be stored or np.array that serializers don't know how to deal with. Or even bytes! And many smaller things.

At this point I either had to give up and pickle it
OR allocate time on figuring out 'how do I make this right'.

antipickle solves this for me.

antipickle is a restricted format for safe, persistent, and platform-independent storage.

Also, it is very convenient:

import antipickle
antipickle.dump(data, 'data.antipickle')
antipickle.dump(data, 's3://mybucket/data.antipickle')     # stores in s3 
antipickle.dump(data, 's3://mybucket/data.antipickle.gz')  # will additionally gzip


loaded_date = antipickle.load('s3://mybucket/data.antipickle.gz') # or local file

To download/upload you need an additional dependency:

  • s3: pip install s3fs.
  • gcs: pip install gcsfs.
  • ssh: pip install sshfs.

Batteries included:

Here is a simple example of what antipickle can save/load:

data = {
    'constants': [3.1415, 2.718, True, False, 42],
    'with nones': [1, None, 0],
    b'bytes': b"can be stored too!",
    'nested lists and tuples': [[1, [2]], (1, 2, None), {'nested': 'dict'}],
    ('tuple', 'as', 'key'): {'is_ok': True},
    'numpy nd': np.zeros([3, 4], dtype='uint32'),
}
antipickle.dump(data, 'data.antipickle')

More formally, antipickle supports python pieces commonly used for computations:

  • bytes, str, int, float, complex, bool, and None
  • list, tuple, set, frozenset (all of them are stored as different entities)
  • dict (including integer keys and tuple keys)
  • PosixPath
  • numpy arrays (native .npy format used; dtype=O not supported)
  • pandas series and dataframe (using parquet serialization via pyarrow)
  • polars series and dataframe (using parquet)
  • Any tree-formed structure of the above (no loops allowed)

Configurable support: dataclasses and pydantic classes, and torch arrays.

For reference, other non-pythonic formats (json and its binary relatives) have problems with native types (not making difference between list and tuple) or encodings (not storing bytes) or collections (not allowing integers, bytes and tuples in dict keys).
Antipickle is python-centric and has it solved.

Installation

pip install antipickle

What is it for

Let's set the expectation bar. antipickle is

  • not fast, but isn't slow either
  • not super-compact, but quite ok
  • restricted: antipickle wasn't designed to serialize just anything, it focuses on common python types and cases for data folks

At the same time, antipickle is

  • safe
  • persistent
  • very convenient
  • modular and easy to extend

and thus suitable for data sharing and data preservation.

When to (not) use pickle

pickle is designed for interprocess communication or as a temporary storage. pickle has a good tradeoff of space- and time- efficiency and can serialize almost anything, including graphs with cycles.

Name pickle suggests you could use it for long-term preservation of data, but that's not true: pickle's serialization is tied to an internal object representation, which is not guaranteed to be preserved in the next release (or even on a different OS). Developers of some packages (notably scikit-learn) provide some guarantees about being able to parse models that were saved with previous 1-2 minor package releases, but that's an exception not a rule.

Second, pickle is insecure. And unreadable. And pickles can be large. During unpickling they can do anything python can, i.e. anything at all. So python docs say it clear: Only unpickle data you trust!.

That said, pickle is extremely convenient and simple to use, and works as a short-term solution for many cases, so we all (python data folks) kinda doing that wrong pickling thing from time to time, because of convenience. And because very few of us are ready to spend time on figuring out proper serialization.

All comments above apply to pickle-like libs like joblib, dill, and cloudpickle.

Changes

0.2.1 - adds pathlib.PosixPath, introduces support for CPU torch tensors and is numpy2.0 - compatible. Deserialization of torch is on by default, and serialization requires first calling: 0.2.2 - adds frozenset and uuid.

antipickle.adapters_default.allow_serialization_of_flat_torch_cpu_tensors()

License

antipickle is distributed under the terms of the MIT license.

Other

antipickle builds upon msgpack-python (the only dependency).

antipickle supports all maintained python versions (Python 3.10+, previous versions support 3.7+)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antipickle-0.2.2.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antipickle-0.2.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file antipickle-0.2.2.tar.gz.

File metadata

  • Download URL: antipickle-0.2.2.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.13 HTTPX/0.28.1

File hashes

Hashes for antipickle-0.2.2.tar.gz
Algorithm Hash digest
SHA256 01b7794466c1d1364e9f78e000fc8effbdb238419257be8a76cc919785655872
MD5 04eb68ce123e804c35432aa0b00cf50f
BLAKE2b-256 9a451ec80e931db04ec2e6d21bdbb4e7a67355ce1135be45548846912ba3edce

See more details on using hashes here.

File details

Details for the file antipickle-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: antipickle-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.13 HTTPX/0.28.1

File hashes

Hashes for antipickle-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2d308147278063dd8130252ce78605ab16926c3dc1c8196044903bb42b0a6b46
MD5 2fd5e21589cdecdd5731c57be4b13a13
BLAKE2b-256 b29781a3e2a647f9916a5074bafe928dc7e55d58e6c3e1cc59e982e03b36fc37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page