Save pytrees efficiently in hdf5 files
Project description
Jaxon
Jaxon is a python library that implements saving and loading of pytrees to the Hierarchical Data Format HDF5. HDF5 is an open format that natively supports multidimensional array objects and metadata information in a single file, resulting in high efficiency. Jaxon embeds all information that is necessary to reconstruct the pytree in a human-readable and self-describing way, so that the output file can still be understood even when the original code is no longer or available, or when it is desired to process the data wth an external tool.
Jaxon is well suited for machine learning or scientific tasks. Its is especially suited for machine learning packages that rely on Python Dataclasses and JAX, e.g. Equinox.
Installation
pip install jaxon
Example Usage
from jaxon import save, load
import numpy as np
import jax.numpy as jnp
pytree = {
"mylist": ["foo", "bar", 42],
"myset": {"a", "b", "z", (42, b"blob")},
"numpy_array": np.arange(3),
"jax_array": jnp.arange(3),
}
save("data.hdf5", pytree)
print(load("data.hdf5"))
Will produce
{'mylist': ['foo', 'bar', 42], 'myset': {'z', 'a', 'b', (42, b'binary!')}, 'numpy_array': array([0, 1, 2]), 'jax_array': Array([0, 1, 2], dtype=int32)}
which is exactly what was send in. Refer to the tests folder for more examples.
To inspect the HDF5 file, an external tool like h5dump or HDFView can bes used.
Supported Types
The pytree can consist of the following types
| Dataype | Stored As |
|---|---|
| list, tuple, dict, set, frozenset | HD5F Group |
| np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64, np.float16, np.float32, np.float64, np.float128, np.complex64, np.complex128, np.bool | HD5F Attribute |
| int, float, bool, complex | String representation, or one of the numpy types above if requested |
| None, slice, range, Ellipsis | String representation |
| str | HD5F UTF-8 (fixed length) string |
| np.ndarray, jax.Array, bytes, bytearray, memoryview | HD5F Attribute (or Dataset on User Request) |
| Any Python Dataclass | HD5F Group, that contains all Fields |
Note that dictionary keys can also be of any of these types or a custom type (if its hashable, of course).
Custom Types: Dataclasses
The most straightforward way to add custom types is to make them a python Dataclass. The package
name, the class name and all fields, including the field names are saved. During loading,
the class is instantiated (without calling __init__) and the field values are set
(even if the datalcass is frozen). Note that machine learning packages like
Equinox make all modules automatically a python
Dataclass. Therefore, Jaxon is fully compatible with models implemented with this package.
Custom Types: The to_jaxon and from_jaxon methods
If during saving a type in the pytree is encountered that is not in the table above, jaxon first
checks if it has the to_jaxon method. If yes, it is ignored if the type is dataclass or
not. The to_jaxon method is called and it must return a supported python container or another
custom object. Jaxon remembers the package and class name. During loading, jaxon instantiates
the class (without calling __init__) and then calls the from_jaxon method to
initialize the class with the object that was returned during saving from the to_jaxon method.
Custom Types: Serialization with dill
As a last resort, Jaxon can Serialize unsupported types using the dill library (basically an
enhanced pickle) and store the result as a binary blob. This feature must be enabled by setting
allow_dill=True. Note that human readability (through HD5F viewer) is lost.
Acknowledgements
Jaxon is build on the following amazing libraries.
The author expresses gratitude to the contributers of the open source community.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jaxon-1.0.2.tar.gz.
File metadata
- Download URL: jaxon-1.0.2.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01630b6333a572be78eb12ac0b68706812ce3d9b09f4598fcd2479b5685d371f
|
|
| MD5 |
c85151d2b7e5e412720eb41d6b7d9ae2
|
|
| BLAKE2b-256 |
c276f41463df48e397770c31939487d19a057944065bda060b2d6e63c7d4cc9b
|
File details
Details for the file jaxon-1.0.2-py3-none-any.whl.
File metadata
- Download URL: jaxon-1.0.2-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17dbff4717d002cae0ae0ef80bc4d0c7dde1e4c050f631c92263883bea2e0fa4
|
|
| MD5 |
41b9fd67343b6c860254e88cea973ad0
|
|
| BLAKE2b-256 |
5956e7697a55341f966c3d07049966841a45a3eacc87a5f835102b3bbd4b4a29
|