Collection of helper tools for reading or writing to h5 files using the h5py library.
Project description
Ch5mpy
Pronounced "champy". This library provides a set of helper tools for easily reading or writing even complex objects to h5 files using the h5py library. It implements wrappers around h5py objects providing APIs identical to regular Python lists and dicts and to numpy ndarrays.
See the complete documentation at https://ch5mpy.readthedocs.io/en/latest/ for more details.
Description
Ch5mpy provides a set of abstractions over h5py's (https://docs.h5py.org/en/stable/) objects for handling them as more commonly used objects :
- H5Dict: an object behaving as regular Python dictionaries, for exploring Files and Groups.
- H5List: an object behaving as regular Python lists for storing any set of objects.
- H5Array: an object behaving as Numpy ndarrays for dealing effortlessly with Datasets while keeping the memory usage low. This works by applying numpy functions to small chunks of the whole Dataset at a time.
- AttributeManager: a dict-like object for accessing an h5 object's metadata.
- read/write utily functions for effortlessly storing any object to an h5 file.
Pickling has also been added to base h5 objects.
Pickling
Ch5mpy provides Datasets, Groups and Files objects wrapping the h5py's equivalents to allow pickling. Those objects can be directly imported from ch5mpy
:
>>> from ch5mpy import File
>>> from ch5mpy import Group
>>> from ch5mpy import Dataset
The H5Mode
enum lists valid modes for opening an h5 file:
>>> from ch5mpy import H5Mode
class H5Mode(str, Enum):
READ = "r" # Readonly, file must exist
READ_WRITE = "r+" # Read/write, file must exist
WRITE_TRUNCATE = "w" # Create file, truncate if exists
WRITE = "w-" # Create file, fail if exists
READ_WRITE_CREATE = "a" # Read/write if exists, create otherwise
Attributes
Metadata on Datasets, Groups and Files can be obtained and modified through the .attrs
attribute, returning an AttributeManager
object. AttributeManagers
behave like Python dictionaries for getting and setting any value.
>>> from ch5mpy import File
>>> f = File('some/file.h5')
>>> f.attrs
AttributeManager{value: 1,
creation: '02/08/2021',
parent: None}
>>> f.attrs['value']
1
AttributeManagers
correctly handle None
values.
H5Dict
An H5Dict
allows to explore the content of an H5 File or Group as if it was a regular Python dict. Any value can be set in an H5Dict
. However, keys in an H5Dict
are not loaded into memory until they are directly requested. Datasets
are wrapped and accessed as H5Arrays
(see section H5Arrays).
To create an H5Dict
, a File
or Group
object must be provided as argument:
>>> from ch5mpy import File
>>> from ch5mpy import H5Dict
>>> from ch5mpy import H5Mode
>>>
>>> dct = H5Dict(File("dict.h5", H5Mode.READ_WRITE))
>>> dct
H5Dict{
a: 1,
b: H5Array([1, 2, 3], shape=(3,), dtype=int64),
c: {...}
}
Here, dct
is an H5Dict
with 3 keys a, b and c
where :
a
maps to the value1
b
maps to a 1D Datasetc
maps to a sub H5Dict with keys and values not loaded yet
Alternatively, an H5Dict
can be created directly from a path to an h5 file:
>>> H5Dict.read("dict.h5")
H5Dict{
a: 1,
b: H5Array([1, 2, 3], shape=(3,), dtype=int64),
c: {...}
}
H5List
An H5List
behave as regular Python lists, allowing to store and access any kind of object in an h5 file. H5Lists
are usually created when regular lists are stored in an h5 file.
As for H5Dicts, H5Lists
can be created by providing a File
or by calling the .read()
method:
>>> from ch5mpy import File
>>> from ch5mpy import H5List
>>> from ch5mpy import H5Mode
>>>
>>> lst = H5List(File("backed_list.h5", H5Mode.READ_WRITE))
>>> lst
H5List[1.0, 2, '4.']
class O_:
def __init__(self, v: float):
self._v = v
def __repr__(self) -> str:
return f"O({self._v})"
>>> lst.append(O(5.0))
>>> lst
H5List[1.0, 2, '4.', O(5.0)]
H5Lists
can store regular integers, floats and strings, but can also store any object (such as the O
object at index 3 in this example).
H5Array
H5Arrays
wrap Datasets
and implement numpy ndarrays' interface to behave as numpy ndarrays while controlling the amount of RAM used. The maximum amount of available RAM for performing operations can be set with the function set_options(max_memory_usage=...)
, using suffixes B
, K
, M
and G
for expressing amounts in bytes.
H5Arrays can be created by passing a Dataset
as argument.
>>> from ch5mpy import File
>>> from ch5mpy import H5Mode
>>> from ch5mpy import H5Array
>>> h5_array = H5Array(File("h5_arrays", H5Mode.READ_WRITE)["integers"])
>>> h5_array
H5Array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], shape=(3, 3), dtype=int64)
>>> h5_array = H5Array(File("h5_arrays", H5Mode.READ_WRITE)["strings"])
>>> h5_array
H5Array(['blue', 'red', 'yellow'], shape=(3,), dtype='<U6')
Then, all usual numpy indexing and methods can be used. To keep the memory footprint small, those methods will be applied repeatedly on small chunks of the underlying Dataset.
To load an H5Array into memory as a numpy array, simply run :
np.array(h5_array)
Read/write utilities
Functions
To store any array-like object (object which could be converted to a numpy ndarray), functions write_dataset()
and write_datasets()
respectively allow to store one or many such objects.
To store any other object, call functions write_object()
and write_objects()
.
To dertermine how the object will be stored in the h5 file, the following rules are applied:
- objects implementing the Storing API will be stored by calling the
__h5_write__()
function - objects that can be converted to numpy arrays will be saved by calling
write_dataset()
- numbers and strings will be stored directly
- all other objects will be stored as binary data by first pickling them
Storing API
To define by hand how an object is stored and read from an h5 file, you can implement the __h5_write__()
and __h5_read__()
methods:
class YourObject:
...
def __h5_write__(self, values: ch5mpy.H5Dict[Any]) -> None:
...
@classmethod
def __h5_read__(cls, values: ch5mpy.H5Dict[Any]) -> YourObject:
...
Both __h5_write__()
and __h5_read__()
receive as input an H5Dict
in which to store or retreive your object. Please note that __h5_read__()
is a classmethod, called as YourObject.__h5_read__()
and which is responsible for both reading data from the H5Dict
and reconstructing an instance of YourObject
.
Roadmap
Numpy methods to implement for H5Arrays
:
Logic functions
- np.all
- np.any
- np.isfinite
- np.isinf
- np.isnan
- np.isnat
- np.isneginf
- np.isposinf
- np.iscomplex
- np.iscomplexobj
- np.isfortran
- np.isreal
- np.isrealobj
- np.isscalar
- np.logical_and
- np.logical_or
- np.logical_not
- np.logical_xor
- np.allclose
- np.isclose
- np.array_equal
- np.array_equiv
- np.greater
- np.greater_equal
- np.less
- np.less_equal
- np.equal
- np.not_equal
Binary operations
- np.bitwize_and
- np.bitwize_or
- np.bitwize_xor
- np.invert
- np.left_shift
- np.right_shift
- np.packbits
- np.unpackbits
- np.binary_repr
String operations
- np.char.add
- np.char.multiply
- np.char.mod
- np.char.capitalize
- np.char.center
- np.char.decode
- np.char.encode
- np.char.expandtabs
- np.char.join
- np.char.ljust
- np.char.lower
- np.char.lstrip
- np.char.partition
- np.char.replace
- np.char.rjust
- np.char.rpartition
- np.char.rsplit
- np.char.rstrip
- np.char.split
- np.char.splitlines
- np.char.strip
- np.char.swapcase
- np.char.title
- np.char.translate
- np.char.upper
- np.char.zfill
- np.char.equal
- np.char.not_equal
- np.char.greater_equal
- np.char.less_equal
- np.char.greater
- np.char.less
- np.char.compare_chararrays
- np.char.count
- np.char.endswith
- np.char.find
- np.char.index
- np.char.isalpha
- np.char.isalnum
- np.char.isdecimal
- np.char.isdigit
- np.char.islower
- np.char.isnumeric
- np.char.isspace
- np.char.istitle
- np.char.isupper
- np.char.rfind
- np.char.rindex
- np.char.startswith
- np.char.str_len
- np.char.array
- np.char.asarray
- np.char.chararray
Mathematical functions
- np.sin
- np.cos
- np.tan
- np.arcsin
- np.arccos
- np.arctan
- np.hypot
- np.arctan2
- np.degrees
- np.radians
- np.unwrap
- np.deg2rad
- np.rad2deg
- np.sinh
- np.cosh
- np.tanh
- np.arcsinh
- np.arccosh
- np.arctanh
- np.around
- np.rint
- np.fix
- np.floor
- np.ceil
- np.trunc
- np.prod
- np.sum
- np.nanprod
- np.nansum
- np.cumprod
- np.cumsum
- np.nancumprod
- np.nancumsum
- np.diff
- np.ediff1d
- np.gradient
- np.cross
- np.trapz
- np.exp
- np.expm1
- np.exp2
- np.log
- np.log10
- np.log2
- np.log1p
- np.logaddexp
- np.logaddexp2
- np.i0
- np.sinc
- np.signbit
- np.copysign
- np.frexp
- np.ldexp
- np.nextafter
- np.spacing
- np.lcm
- np.gcd
- np.add
- np.reciprocal
- np.positive
- np.negative
- np.multiply
- np.divide
- np.power
- np.subtract
- np.true_divide
- np.floor_divide
- np.float_power
- np.fmod
- np.mod
- np.modf
- np.remainder
- np.divmod
- np.angle
- np.real
- np.imag
- np.conj
- np.conjugate
- np.maximum
- np.fmax
- np.amax
- np.nanmax
- np.minimum
- np.fmin
- np.amin
- np.nanmin
- np.convolve
- np.clip
- np.sqrt
- np.cbrt
- np.square
- np.absolute
- np.fabs
- np.sign
- np.heaviside
- np.nan_to_num
- np.real_if_close
- np.interp
Set routines
- np.unique
- np.in1d
- np.intersect1d
- np.isin
- np.setdiff1d
- np.setxor1d
- np.union1d
Array creation routines
- np.empty
- ch5mpy.empty
- np.empty_like
- np.eye
- np.identity
- np.ones
- ch5mpy.ones
- np.ones_like
- np.zeros
- ch5mpy.zeros
- np.zeros_like
- np.full
- ch5mpy.full
- np.full_like
- np.array
- np.asarray
- np.asanyarray
- np.ascontiguousarray
- np.asmatrix
- np.copy
- np.frombuffer
- np.from_dlpack
- np.fromfile
- np.fromfunction
- np.fromiter
- np.fromstring
- np.loadtxt
- np.core.records.array
- np.core.records.fromarrays
- np.core.records.fromrecords
- np.core.records.fromstring
- np.core.records.fromfile
- np.core.defchararray.array
- np.core.defchararray.asarray
- np.arange
- np.linspace
- np.logspace
- np.geomspace
- np.meshgrid
- np.mgrid
- np.ogrid
- np.diag
- np.diagflat
- np.tri
- np.tril
- np.triu
- np.vander
- np.mat
- np.bmat
Array manipulation routines
- np.copyto
- np.shape
- np.reshape
- np.ravel
- np.ndarray.flat
- np.ndarray.flatten
- np.moveaxis
- np.rollaxis
- np.swapaxes
- np.ndarray.T
- np.transpose
- np.atleast_1d
- np.atleast_2d
- np.atleast_3d
- np.broadcast
- np.broadcast_to
- np.broadcast_arrays
- np.expand_dims
- np.squeeze
- np.asarray
- np.asanyarray
- np.asmatrix
- np.asfarray
- np.asfortranarray
- np.ascontiguousarray
- np.asarray_chkfinite
- np.require
- np.concatenate
- np.stack
- np.block
- np.vstack
- np.hstack
- np.dstack
- np.column_stack
- np.row_stack
- np.split
- np.array_split
- np.dsplit
- np.hsplit
- np.vsplit
- np.tile
- np.repeat
- np.delete
- np.insert
- np.append
- np.resize
- np.trim_zeros
- np.unique
- np.flip
- np.fliplr
- np.flipud
- np.reshape
- np.roll
- np.rot90
Sorting, searching, and counting
- np.sort
- np.lexsort
- np.argsort
- np.ndarray.sort
- np.sort_complex
- np.partition
- np.argpartition
- np.argmax
- np.nanargmax
- np.argmin
- np.nanargmin
- np.argwhere
- np.nonzero
- np.flatnonzero
- np.where
- np.searchsorted
- np.extract
- np.count_nonzero
Random
- beta
- binomial
- bytes
- chisquare
- choice
- dirichlet
- exponential
- f
- gamma
- get_state
- geometric
- gumbel
- hypergeometric
- laplace
- logistic
- lognormal
- logseries
- multinomial
- multivariate_normal
- negative_binomial
- noncentral_chisquare
- noncentral_f
- normal
- pareto
- permutation
- poisson
- power
- rand
- randint
- randn
- random
- random_integers
- random_sample
- rayleigh
- seed
- set_state
- shuffle
- standard_cauchy
- standard_exponential
- standard_gamma
- standard_normal
- standard_t
- triangular
- uniform
- vonmises
- wald
- weibull
- zipf
Misc
- np.ndim
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ch5mpy-0.4.6.tar.gz
.
File metadata
- Download URL: ch5mpy-0.4.6.tar.gz
- Upload date:
- Size: 62.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa9adb46749a8f665673ec0bb80ce43e00020a52a9fc064f42518f75e3056d87 |
|
MD5 | 1769ce6ea5c4ed604b386a107c789897 |
|
BLAKE2b-256 | e2061553dee6c8ac1d2fe1215a781b0cfeaa02e5f3f81b4fb4c4198af3fa68e5 |
File details
Details for the file ch5mpy-0.4.6-py3-none-any.whl
.
File metadata
- Download URL: ch5mpy-0.4.6-py3-none-any.whl
- Upload date:
- Size: 78.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/5.15.0-116-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa0aa8c608aeffef8113b527ddbc91233d8cc17cbbce6abb3752cd0e7b806ccc |
|
MD5 | 43c2a12d5af4aabaef90fc0e05c23876 |
|
BLAKE2b-256 | 9d45e683217bf7ff6786ff49cddb5386ef8bb2e2e87e634f4b0595666b930838 |