Skip to main content

Save pytrees efficiently in hdf5 files

Project description

Jaxon

Jaxon is a python library that implements saving and loading of pytrees to the Hierarchical Data Format HDF5. HDF5 is an open format that natively supports multidimensional array objects and metadata information in a single file, resulting in high efficiency. Jaxon embeds all information that is necessary to reconstruct the pytree in a human-readable and self-describing way, so that the output file can still be understood even when the original code is no longer or available, or when it is desired to process the data wth an external tool.

Jaxon is well suited for machine learning or scientific tasks. Its is especially suited for machine learning packages that rely on Python dataclasses and JAX, e.g. Equinox.

Installation

pip install jaxon

Example Usage

from jaxon import save, load
import numpy as np
import jax.numpy as jnp 

pytree = {
    "mylist": ["foo", "bar", 42],
    "myset": {"a", "b", "z", (42, b"blob")},
    "numpy_array": np.arange(3),
    "jax_array": jnp.arange(3),
}
save("data.hdf5", pytree)
print(load("data.hdf5"))

Will produce

{'mylist': ['foo', 'bar', 42], 'myset': {'z', 'a', 'b', (42, b'binary!')}, 'numpy_array': array([0, 1, 2]), 'jax_array': Array([0, 1, 2], dtype=int32)}

which is exactly what was send in. Refer to the tests folder for more examples. To inspect the HDF5 file external tools like h5dump or HDFView can be used.

Supported Types

Overview

The pytree can consist of the following types:

Dataype Stored As
list, tuple, dict, set, frozenset HD5F group
np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64, np.float16, np.float32, np.float64, np.float128, np.complex64, np.complex128, np.bool HD5F attribute
int, float, bool, complex String representation, or one of the numpy types above if requested
None, slice, range, Ellipsis String representation
str HD5F UTF-8 (fixed length) string
np.ndarray, jax.Array, bytes, bytearray, memoryview HD5F attribute (or dataset on user request)
Any python dataclass HD5F group, that contains all Fields

Note that dictionary keys can also be of any of these types or a custom type (if its hashable, of course).

Notes on JAX and NumPy arrays

Jaxon supports JAX and NumPy arrays as indicated in the table above. However, special attributes such as titles are not stored. Jaxon only stores the contents of the array.

Notes on dataclasses

Jaxon stores the package name, the class name and all fields, including the field names. During loading, the class is instantiated (without calling __init__) and the field values are set (even if the dataclass is frozen). Note that machine learning packages like Equinox make all modules automatically a python dataclass. Therefore, Jaxon is fully compatible with models implemented with this package. Jaxon can deal to some extent with dataclasses that have been changed (fields added or removed) between saving and loading. Please refer to the documentation of the load function for more information.

Supported Data Structures

Jaxon can save pytrees without circular references that consists of the supported types listed above, with the extension that dictionary keys can be pytrees as well. Note that Jaxon recovers all references as they have been in the saved pytree. For example, if pytree={"a": a, "b": a} where a=np.array([1]) then pytree["a"] is pytree["b"] is guranteed to remain True after pytree has been saved and loaded again.

Custom Types

The to_jaxon/from_jaxon interface

If during saving a type in the pytree is encountered that is not in the table above, jaxon first checks if it has the to_jaxon method. If yes, it is ignored if the type is dataclass or not. The to_jaxon method is called and it must return a supported python container or another custom object. Jaxon remembers the package and class name. During loading, jaxon instantiates the class (without calling __init__) and then calls the from_jaxon method to initialize the class with the object that was returned during saving from the to_jaxon method.

Adding custom marshaler/unmarshaler functions

It is possible to provide Jaxon with a list of custom marshaler/unmarshaler functions, which can be used to convert arbitrary types to other types that are understood by Jaxon. As opposed to the to_jaxon/from_jaxon interface, these methods allow additional control over how the type is named in the hdf5 file produced by Jaxon.

Serialization with dill

As a last resort, Jaxon can serialize unsupported types using the dill library (basically an enhanced pickle) and store the result as a binary blob. This feature must be enabled by setting allow_dill=True. Note that human readability (through HD5F viewer) is lost for the pickled objects.

Acknowledgements

Jaxon is build on the following amazing libraries.

The author expresses gratitude to the contributors of the open source community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaxon-1.1.0.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jaxon-1.1.0-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file jaxon-1.1.0.tar.gz.

File metadata

  • Download URL: jaxon-1.1.0.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for jaxon-1.1.0.tar.gz
Algorithm Hash digest
SHA256 dd3fce245d4c6030bb5064ef1af8a75769a29fded3a06404bb483b3abd93d971
MD5 4d3af2e90c47fa1033ca0002f5564b51
BLAKE2b-256 cbcd5620f36a5d4c10ac32077121e1644081fea2c05b198358b70834e91db69d

See more details on using hashes here.

File details

Details for the file jaxon-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: jaxon-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for jaxon-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0786b43ff7945a686ef8a12b73bc89de1ef0b227834dfc484b0859e6fa662f30
MD5 2bd9956122c04874044ef71f7fc4c8eb
BLAKE2b-256 2d2656c9a0482a9b25e47c9540280deaac2ba56f16997d9ce6f33ad303443fbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page