Handle, transform, and visualize hierarchically structured data
dantro – from data and dentro (Greek for tree) – is a Python package that provides a uniform interface for hierarchically structured and semantically heterogeneous data. It is built around three main features:
- data handling: loading heterogeneous data into a tree-like data structure, providing a uniform interface to it
- data transformation: performing arbitrary operations on the data, if necessary using lazy evaluation
- data visualization: creating a visual representation of the processed data
Together, these stages constitute a data processing pipeline: an automated sequence of predefined, configurable operations. Akin to a Continuous Integration pipeline, a data processing pipeline provides a uniform, consistent, and easily extensible infrastructure that contributes to more efficient and reproducible workflows. This can be beneficial especially in a scientific context, for instance when handling data that was generated by computer simulations.
dantro is meant to be integrated into projects and be used to set up such a data processing pipeline, customized to the needs of the project. It is designed to be easily customizable to the requirements of the project it is integrated in, even if the involved data is hierachically structured or semantically heterogeneous. Furthermore, it allows a configuration-based specification of all operations via YAML configuration files; the resulting pipeline can then be controlled entirely via these configuration files and without requiring code changes.
The dantro package is open source software released under the LGPLv3+ license. It was developed alongside the Utopia project (a modelling framework for complex and adaptive systems), but is an independent package.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.