Skip to main content

Seamless serialization and deserialization of complex Python objects — a more portable and readable alternative to pickle for data-heavy workflows.

Project description

DataZip

Actions status GitHub Pages Status PyPI Latest Release GitHub License Ruff uv

DataZip is a Python library that extends zipfile.ZipFile to provide seamless serialization and deserialization of complex Python objects — a more portable and readable alternative to pickle for data science workflows.

Why DataZip?

  • Human-inspectable archives: DataZip files are standard .zip files. You can open them with any archive tool and inspect the contents.
  • Broad type support: Works out of the box with pandas DataFrames/Series, NumPy arrays, Polars DataFrames, datetimes, paths, sets, frozensets, complex numbers, and custom classes.
  • Efficient storage: Tabular data is stored as Parquet; arrays as .npy. JSON is used for metadata and simple types.
  • Lazy loading: Objects and data are only deserialized when they are accessed, allowing efficient loading of objects within huge files. Nested access avoids deserialzing unnecessary enclosing objects.
  • No pickle by default: Most types are serialized without pickle, making files safer and more portable.
  • Custom class integration: Any class that implements __getstate__/__setstate__ (the standard pickle protocol) works automatically. The IOMixin makes it even simpler.
  • Pluggable type support: Teach DataZip how to handle any third-party or stdlib type by registering encoder/decoder pairs with DataZip.register_coders. The bundled NumPy, pandas, Polars, and Plotly integrations are themselves built on this hook — see the User Guide for details.

Quick Example

from io import BytesIO
import pandas as pd
from datazip import DataZip

# Write
buffer = BytesIO()
with DataZip(buffer, "w") as z:
    z["df"] = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
    z["config"] = {"threshold": 0.5, "labels": ["a", "b"]}
    z["values"] = {1, 2, frozenset([3, 4])}

# Read
with DataZip(buffer, "r") as z:
    df = z["df"]
    config = z["config"]

Supported Types

Category Types
Primitives str, int, float, bool, None, complex
Collections dict, list, tuple, set, frozenset, deque, defaultdict
Date/Time datetime, pandas.Timestamp
Paths pathlib.Path
Custom Any class with __getstate__/__setstate__
Optional numpy.ndarray, pandas.DataFrame, pandas.Series, polars.DataFrame, polars.LazyFrame, polars.Series, xarray.Dataset, Plotly figures

Installation

pip install datazip

See the Installation page for full details including optional dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datazip-0.3.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datazip-0.3.0-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file datazip-0.3.0.tar.gz.

File metadata

  • Download URL: datazip-0.3.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for datazip-0.3.0.tar.gz
Algorithm Hash digest
SHA256 29e7e4fc6d25ec475a58bd1104967eef89564257c91d45a76fe64f1fb7d64eb9
MD5 3dd428e53d751bdda272e50b225bd061
BLAKE2b-256 fd938a1648604ecf81229e668efe1d2be19553e9bf7a595c519d0178240bc9bd

See more details on using hashes here.

File details

Details for the file datazip-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: datazip-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for datazip-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97375169ea22f5c4f26d2d938e65350f933d555e193e21f79407d226ee60334d
MD5 52febee6ba2758ae899c7be2f4fd39af
BLAKE2b-256 e0e07a873530819aab872c3912b5b61df26e9ebbfc6c16aa472fecc83151eda5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page