Generate and design netCDF4 files using a YAML configuration file.
Project description
H5YAML
Description
This package let you design the layout of HDF5/netCDF4 files.
The layout of a netCDF4 file is defined by its Groups (defining the structure), Dimensions,
Variables (its dataset) and Attributes (its metadata) for both file and variables.
As of version 0.4, you can create netCDF4 files based on a Python dictionary.
This dictionary can be constructed from a YAML file.
But alternatively you could also use JSON or even XML
(these may be added to future releases of h5yaml).
In the design phase, you can quickly generate very small netCDF4 files, because the variables are still empty. These products can be shared among colleagues for review or perform a metadata compliance check with the CF conventions of the ACDD
And finally, you can implement the file-structure in JAVA, C++ or Fortran. But of course you can also simply generate the empty products and fill the dataset using Python.
In short, this approach has the following advantages:
- you define the layout of your HDF5/netCDF4 file using YAML which is human-readable and has intuitive syntax.
- you can reuse the YAML configuration file to to have all your product have a consistent layout.
- you can make updates by only changing the YAML configuration file.
- you can have the layout of your HDF5/netCDF4 file as a Python dictionary, thus without accessing any HDF5/netCDF4 file.
The H5YAML package provides the classes H5Create and NcCreate to generate a HDF5/netCDF4 formatted file from a Python dictionary.
- The class
H5Createuses the h5py package, which is a Pythonic interface to the HDF5 binary data format. The generated HDF5 file should be compatible with the netCDF4 format. H5Create is faster than the netCDF4 implementation and generates smaller files. - The class
NcCreateuses the netCDF4 package, which provides an object-oriented python interface to the netCDF version 4 library. You should use this class when strict conformance with the netCDF4 format is required. However, packagenetCDF4has some limitations, whichh5pyhas not, for example it does not allow variable-length variables to have a compound data-type.
Installation
The package h5yaml is available from PyPI. To install it use pip:
$ pip install [--user] h5yaml
The module h5yaml requires Python3.10+ and Python modules: h5py (v3.14+), netCDF4 (v1.7+) and numpy (v2.0+).
Note: the packages h5py and netCDF4 come with their own HDF5 libraries. If these are different then they may
collide and result in a ''HDF5 error''.
If this is the case then you have to install the development packages of HDF5 and netCDF4 (or compile them from source).
And reinstall h5py and netCDF4 using the commands:
$ pip uninstall h5py; pip install --no-binary=h5py h5py $ pip uninstall netCDF4; pip install --no-binary=netCDF4 netCDF4
Usage
The class NcFromYaml can be used to generate netCDF4 files using the Python packages h5py (default) or netCDF4
where a YAML file is used to define the layout of the netCDF4 file.
from importlib.resources import files
from h5yaml.nc_from_yaml import NcFromYaml
res = NcFromYaml(files("h5yaml.Data") / "nc_testing.yaml")
# show the YAML configuration as a Python dictionary using pprint
print(res)
# generate an in-memory HDF5 file
fid = res.diskless()
# write data to datasets of the file
# ...
# write HDF5 file to disk
res.to_disk(fid, filename)
In the next example, we use the Python package netCDF4 and write the file directly to disk:
from importlib.resources import files
from netCDF4 import Dataset
from h5yaml.nc_from_yaml import NcFromYaml
res = NcFromYaml(files("h5yaml.Data") / "nc_testing.yaml")
# use package `netCDF4` and write the file to disk
res.use_netcdf4().create(filename)
with Dataset(filename, "r+") as fid
# write data to variables of the file
The YAML file should be structured as follows:
-
The top level are: 'groups', 'dimensions', 'compounds', 'variables', 'attrs_global' and 'attrs_groups'.
-
'attrs_global' and 'attrs_groups' are added in version 0.3.0
-
The names of the attributes, groups, dimensions, compounds and variable should be specified as PosixPaths, however:
- The names of groups should never start with a slash (always relative to root);
- All other elements which are stored in root should also not start with a slash;
- Hoewever the non-group elements require a starting slash (absolute paths) when they are stored not the root.
-
The section 'groups' are optional, but you should provide each group you want to use in your file. The 'groups' section in the YAML file may look like this:
groups: - engineering_data - image_attributes - navigation_data - science_data - processing_control/input_data -
The section 'dimensions' is obligatory, you should define the dimensions for each variable in your file. The 'dimensions' section may look like this:
dimensions: days: _dtype: u4 _size: 0 long_name: days since 2024-01-01 00:00:00Z number_of_images: # an unlimited dimension _dtype: u2 _size: 0 samples_per_image: # a fixed dimension _dtype: u4 _size: 307200 /navigation_data/att_time: # an unlimited dimension in a group with attributes _dtype: f8 _size: 0 _FillValue: -32767 long_name: Attitude sample time (seconds of day) calendar: proleptic_gregorian units: seconds since %Y-%m-%d %H:%M:%S valid_min: 0 valid_max: 92400 n_viewport: # a fixed dimension with fixed values and attributes _dtype: i2 _size: 5 _values: [-50, -20, 0, 20, 50] long_name: along-track view angles at sensor units: degrees -
The 'compounds' are optional, but you should provide each compound data-type which you want to use in your file. For each compound element you have to provide its data-type and attributes: units and long_name. The 'compound' section may look like this:
compounds: stats_dtype: time: [u8, seconds since 1970-01-01T00:00:00, timestamp] index: [u2, '1', index] tbl_id: [u1, '1', binning id] saa: [u1, '1', saa-flag] coad: [u1, '1', co-addings] texp: [f4, ms, exposure time] lat: [f4, degree, latitude] lon: [f4, degree, longitude] avg: [f4, '1', '$S - S_{ref}$'] unc: [f4, '1', '\u03c3($S - S_{ref}$)'] dark_offs: [f4, '1', dark-offset] -
The 'variables' are defined by their data-type ('_dtype') and dimensions ('_dims'), and optionally chunk sizes ('_chunks'), compression ('_compression'), variable length ('_vlen'). In addition, each variable can have as many attributes as you like, defined by its name and value. The 'variables' section may look like this:
variables: /science_data/detector_images: _dtype: u2 _dims: [number_of_images, samples_per_image] _compression: 3 _FillValue: 65535 long_name: Detector pixel values coverage_content_type: image units: '1' valid_min: 0 valid_max: 65534 /image_attributes/nr_coadditions: _dtype: u2 _dims: [number_of_images] _FillValue: 0 long_name: Number of coadditions units: '1' valid_min: 1 /image_attributes/exposure_time: _dtype: f8 _dims: [number_of_images] _FillValue: -32767 long_name: Exposure time units: seconds stats_163: _dtype: stats_dtype _dims: [days] _vlen: True comment: detector map statistics (MPS=163)
Notes and ToDo
- The layout of a HDF5 or netCDF4 file can be complex. From version 0.3, you can split the file definition over several YAML files and provide a list with the names of YAML files as input to H5Yaml and NcYaml.
- From version 0.4, the classes
H5YamlandNcYamlare replaced byNcFromYaml. You can use moduleh5pyto write the netCDF4 file ornetCDF4using the keyword 'module'. Then the classesH5CreateorNcCreateperform thedictto netCDF4 conversion.
Support [TBW]
Road map
- Release v0.1 : stable API to read your YAML files and generate the HDF5/netCDF4 file
Authors and acknowledgment
The code is developed by R.M. van Hees (SRON)
License
- Copyright: Richard van Hees (SRON) (https://www.sron.nl).
- License: Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file h5yaml-0.4.2.tar.gz.
File metadata
- Download URL: h5yaml-0.4.2.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
227dd89d1c095ad9829d5c2d5b8088a985d325d6abb5cbc1fc33d09850060db0
|
|
| MD5 |
efda605e983651fdb8e20431dbc88f65
|
|
| BLAKE2b-256 |
0ffac25d76da5b5b573d5ffbb6122dca131cbefbdf28599cf07aa10bc48b26bc
|
File details
Details for the file h5yaml-0.4.2-py3-none-any.whl.
File metadata
- Download URL: h5yaml-0.4.2-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5956bf784ec3db20b7c41c63554fba5f83f11d8162e2cc971144fbedada19869
|
|
| MD5 |
8f84531ebc1593735d6bc9819611049e
|
|
| BLAKE2b-256 |
b73ecdd55e54e195856aa207dc4395365674020364f5c48e8e6d52cb241b4d4a
|