Supporting a FAIR Research Data lifecycle using Python and HDF5.
Project description
HDF5 Research Data Management Toolbox
Note, that the project is still under development!
The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a python package supporting everybody who is working with HDF5 to achieve a sustainable data lifecycle which follows the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It specifically supports the five main steps of
- Planning (defining a internal layout for HDF5 a metadata convention for attribute usage)
- Collecting data (creating HDF5 files or converting to HDF5 files from other sources)
- Analyzing and processing data (Plotting, deriving data, ...)
- Sharing data (publishing, archiving, ... e.g. to databases like mongoDB or repositories like Zenodo)
- Reusing data (Searching data in databases, local file structures or online repositories like Zenodo).
Quickstart
A quickstart notebook can be tested by clicking on the following badge:
Documentation
Please find a comprehensive documentation with many examples here or by click on the image, which shows the research data lifecycle in the center and the respective toolbox features on the outside:
Installation
Use python 3.8 or higher (tested until 3.10). If you are a regular user, you can install the package via pip:
pip install h5RDMtoolbox
Install from source:
Developers may clone the repository and install the package from source. Clone the repository first:
git clone https://github.com/matthiasprobst/h5RDMtoolbox.git
Then, run
pip install h5RDMtoolbox/
Add --user
if you do not have root access.
For development installation run
pip install -e h5RDMtoolbox/
Dependencies
The core functionality depends on the following packages. Some of them are for general management others are very specific to the features of the package:
General dependencies are ...
numpy>=1.20,<1.23.0
: Scientific computing, handling of arraysmatplotlib>=3.5.2
: Plottingappdirs>=1.4.4
: Managing user and application directoriespackaging
: Version handlingIPython>=8.4.0
: Pretty display of data in notebooksregex>=2020.7.9
: Working with regular expressions
Specific to the package are ...
h5py=3.7.0
: HDF5 file interfacexarray>=2022.3.0
: Working with scientific arrays in combination with attributes. Allows carrying metadata from HDF5 to userpint>=0.19.2
: Allows working with unitspint_xarray>=0.2.1
: Working with units for usage with xarraypython-forge==18.6.0
: Used to update function signatures when using the standard attributespyyaml
: Reading and writing of yaml files, e.g. metadata definitions (conventions)requests
: Used to download files from the internet or validate URLs, e.g. metadata definitions (conventions)
Optional dependencies
To run unit tests or to enable certain features, additional dependencies must be installed.
Install optional dependencies by specifying them in square brackets after the package name, e.g.:
pip install h5RDMtoolbox[mongodb]
[mongodb]
pymongo>=4.2.0
: Database solution for HDF5 files
[io]
pco_tools>=1.0.0
: Reading of pco image filesopencv-python>=4.5.3.56
: Reading of image files (other than pco)pandas>=1.4.3
: Mainly used for reading csv and pretty printing
[snt]
xmltodict
: Reading of xml filestabulate>=0.8.10
: Pretty printing of tablespython-gitlab
: Access to gitlab repositoriespandoc>=2.3
: Conversion of markdown files to html
Contribution
Feel free to contribute. Make sure to write docstrings
to your methods and classes and please write tests and use PEP
8 (https://peps.python.org/pep-0008/)
Please write tests for your code and put them into the test/
folder. Visit the README file in the
test-folder for more information.
Pleas also add a jupyter notebook in the docs/
folder in order to document your code. Please visit
the README file in the docs-folder for more information on how to compile the documentation.
Please use the numpy style for the docstrings: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for h5rdmtoolbox-0.11.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5529144d09d81bd9582571e8c87dd29986713046140d0a0120d66da1bf99a44b |
|
MD5 | e1c087ce1c00ea4563652477f77f9a26 |
|
BLAKE2b-256 | 8d67c9dfbc8d4050fa2b71fd63ca89e80181e324757353fc7ce4943e0f09d221 |