Skip to main content

Supporting a FAIR Research Data lifecycle using Python and HDF5.

Project description

HDF5 Research Data Management Toolbox

Tests DOCS Documentation Status pyvers

Note, that the project is still under development!

The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a python package supporting everybody who is working with HDF5 to achieve a sustainable data lifecycle which follows the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It specifically supports the five main steps of

  1. Planning (defining a internal layout for HDF5 a metadata convention for attribute usage)
  2. Collecting data (creating HDF5 files or converting to HDF5 files from other sources)
  3. Analyzing and processing data (Plotting, deriving data, ...)
  4. Sharing data (publishing, archiving, ... e.g. to databases like mongoDB or repositories like Zenodo)
  5. Reusing data (Searching data in databases, local file structures or online repositories like Zenodo).

Quickstart

A quickstart notebook can be tested by clicking on the following badge:

Open Quickstart Notebook

Documentation

Please find a comprehensive documentation with many examples here or by click on the image, which shows the research data lifecycle in the center and the respective toolbox features on the outside:

RDM lifecycle

Installation

Use python 3.8 or higher (tested until 3.10). If you are a regular user, you can install the package via pip:

pip install h5RDMtoolbox

Install from source:

Developers may clone the repository and install the package from source. Clone the repository first:

git clone https://github.com/matthiasprobst/h5RDMtoolbox.git

Then, run

pip install h5RDMtoolbox/

Add --user if you do not have root access.

For development installation run

pip install -e h5RDMtoolbox/

Dependencies

The core functionality depends on the following packages. Some of them are for general management others are very specific to the features of the package:

General dependencies are ...

  • numpy>=1.20,<1.23.0: Scientific computing, handling of arrays
  • matplotlib>=3.5.2: Plotting
  • appdirs>=1.4.4: Managing user and application directories
  • packaging: Version handling
  • IPython>=8.4.0: Pretty display of data in notebooks
  • regex>=2020.7.9: Working with regular expressions

Specific to the package are ...

  • h5py=3.7.0: HDF5 file interface
  • xarray>=2022.3.0: Working with scientific arrays in combination with attributes. Allows carrying metadata from HDF5 to user
  • pint>=0.19.2: Allows working with units
  • pint_xarray>=0.2.1: Working with units for usage with xarray
  • python-forge==18.6.0: Used to update function signatures when using the standard attributes
  • pyyaml: Reading and writing of yaml files, e.g. metadata definitions (conventions)
  • requests: Used to download files from the internet or validate URLs, e.g. metadata definitions (conventions)

Optional dependencies

To run unit tests or to enable certain features, additional dependencies must be installed.

Install optional dependencies by specifying them in square brackets after the package name, e.g.:

pip install h5RDMtoolbox[mongodb]

[mongodb]

  • pymongo>=4.2.0: Database solution for HDF5 files

[io]

  • pco_tools>=1.0.0: Reading of pco image files
  • opencv-python>=4.5.3.56: Reading of image files (other than pco)
  • pandas>=1.4.3: Mainly used for reading csv and pretty printing

[snt]

  • xmltodict: Reading of xml files
  • tabulate>=0.8.10: Pretty printing of tables
  • python-gitlab: Access to gitlab repositories
  • pandoc>=2.3: Conversion of markdown files to html

Planned, future developments

  • Using JSON schema definitions for layouts and conventions

Contribution

Feel free to contribute. Make sure to write docstrings to your methods and classes and please write tests and use PEP 8 (https://peps.python.org/pep-0008/)

Please write tests for your code and put them into the test/ folder. Visit the README file in the test-folder for more information.

Pleas also add a jupyter notebook in the docs/ folder in order to document your code. Please visit the README file in the docs-folder for more information on how to compile the documentation.

Please use the numpy style for the docstrings: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5rdmtoolbox-0.12.2.tar.gz (156.4 kB view details)

Uploaded Source

Built Distribution

h5rdmtoolbox-0.12.2-py3-none-any.whl (192.3 kB view details)

Uploaded Python 3

File details

Details for the file h5rdmtoolbox-0.12.2.tar.gz.

File metadata

  • Download URL: h5rdmtoolbox-0.12.2.tar.gz
  • Upload date:
  • Size: 156.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.18

File hashes

Hashes for h5rdmtoolbox-0.12.2.tar.gz
Algorithm Hash digest
SHA256 13b6eddc7be65fbdb8bafd805d1cc8775cf674932270124d633ac14805ee1209
MD5 db5e45a5962eac45516c3de993ebe2e0
BLAKE2b-256 622c11b8a9c89c845c48d29ca82cf5cd208795dbc9f89ba38a9974bbd906a4df

See more details on using hashes here.

File details

Details for the file h5rdmtoolbox-0.12.2-py3-none-any.whl.

File metadata

  • Download URL: h5rdmtoolbox-0.12.2-py3-none-any.whl
  • Upload date:
  • Size: 192.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.18

File hashes

Hashes for h5rdmtoolbox-0.12.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4d9abb7aa46427c004cf0afa363ff46103e1a94e9a046dfbe97bb754148ad57a
MD5 110cb0813160fc4616e6ac1dd1811318
BLAKE2b-256 5959c7ec3094e5c49fbedb100decedaa2e7e14af6ca58c250998bb6c8f3c67bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page