dsch

Structured, metadata-enhanced data storage.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Project description

Introduction

Dsch provides a way to store data and its metadata in a structured, reliable way. It is built upon well-known data storage engines, such as the HDF5 file format, providing performance and long-term stability.

The core feature is the schema-based approach to data storage, which means that a pre-defined schema specification is used to determine:

which data fields are available
the (hierarchical) structure of data fields
metadata of the stored values (e.g. physical units)
expected data types and constraints for the stored values

In fact, this is similar to an API specification, but it can be attached to and stored with the data. Programs writing datasets benefit from data validation and the high-level interface. Reading programs can determine the given data’s schema upfront, and process accordingly. This is especially useful with schemas evolving over time.

For persistent storage, dsch supports multiple storage engines via its backends, but all through a single, transparent interface. Usually, there are no client code changes required to support a new backend, and custom backends can easily be added to dsch. Currently, backends exist for these storage engines:

HDF5 files (through h5py)
NumPy .npz files
MATLAB .mat files (through SciPy)

Note that dsch is only a thin layer, so that users can still benefit from the performance of the underlying storage engine. Also, files created with dsch can always be opened directly (i.e. without dsch) and still provide all relevant information, even the metadata!

Reasoning

Dsch is a response to the challenges in low-level data acquisition scenarios, which are commonly found in labs at universities or R&D departments. Frequent changes in both hardware and software are commonplace in these environments, and since those changes are often made by different people, the data acquisition hardware, software and data consumption software tend to get out of sync. At the same time, datasets are often stored (and used!) for many years, which makes backwards-compatibility a significant issue.

Dsch aims to counteract these problems by making the data exchange process more explicit. Using pre-defined schemas ensures backward-compatibility as long as possible, and when it can no longer be retained, provides a clear way to detect (and properly handle) multiple schema versions. Also, schema based validation allows to detect possible errors upfront, so that most non-security-related checks do not have to be re-implemented in data consuming applications.

Note that dsch is targeted primarily at these low-level applications. When using high-level data processing or even data science and machine learning techniques, data is often pre-processed and aggregated with regard to a specific application, which often eliminates the need for some of dsch’s features, such as the metadata storage. One might think of dsch as the tool to handle data before it is filled into something like pandas.

Changelog

This project follows the guidelines of Keep a changelog and adheres to Semantic versioning.

0.3.2 - 2024-06-25

Fixed

Fix issue with h5py version 3 string handling

0.3.1 - 2024-06-25

(withdrawn)

0.3.0 - 2021-02-12

Added

New data_tree method for exporting data as nested dict/list structures.

Changed

Improve documentation.
Improve tests

Fixed

Minor updates to handle h5py deprecations.

0.2.1 - 2018-02-02

Changed

h5py and scipy, needed for HDF5 and MAT file support, respectively, are now listed as extras / optional dependencies in setup.py.

Fixed

Fix missing type conversion for Scalar in inmem backend that causes validation to incorrectly fail in some cases.

0.2.0 - 2018-02-01

Added

New node type for bytes data.
In-memory backend, for handling data without needing e.g. a file on disk.
Support for copying data between different storages.
Support for creating new storages from existing ones, aka. “save as”.
PseudoStorage abstraction class for unified data access in libraries.
Human-readable tree-representation of data nodes for use in interactive sessions.
Support == operator for schema nodes.

Changed

Data nodes in Compilations and Lists can no longer be overwritten accidentally when trying to overwrite their stored value.
Improve structure and conciseness of docs.
Change List to evaluate empty-ness recursively.
Replace generic exceptions like TypeError by custom dsch exceptions.

0.1.3 - 2018-01-11

Changed

Attempting to open a non-existent file now shows a sensible error message.
Attempting to create an existing file now shows a sensible error message.

Fixed

Fix error when handling partially filled compilations.
Fix typo in documentation.

0.1.2 - 2017-08-25

Fixed

Fix incorrect ordering of list items.

0.1.1 - 2017-06-09

Added

Cover additional topics in documentation.

Fixed

Fix error when handling single-element lists with mat backend.

0.1.0 - 2017-05-18

Added

First preview release.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

This version

0.3.2

Jun 25, 2024

0.3.0

Feb 12, 2021

0.2.1

Feb 2, 2018

0.2.0

Feb 1, 2018

0.1.3

Jan 11, 2018

0.1.2

Aug 25, 2017

0.1.1

Jun 12, 2017

0.1.0

May 23, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsch-0.3.2.tar.gz (55.3 kB view details)

Uploaded Jun 25, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dsch-0.3.2-py3-none-any.whl (35.8 kB view details)

Uploaded Jun 25, 2024 Python 3

File details

Details for the file dsch-0.3.2.tar.gz.

File metadata

Download URL: dsch-0.3.2.tar.gz
Upload date: Jun 25, 2024
Size: 55.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.9

File hashes

Hashes for dsch-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`dea1e56ef5d6e2739243784ead47505326379516c9b6dd2acf89b1e640407d1b`
MD5	`ba2a2aa39056310fb69e11d04ce5a8c4`
BLAKE2b-256	`0021b68bf7efe50aa007e6894a50ae029114b21d7cccbee3ba50bc2a2a9e79e8`

See more details on using hashes here.

File details

Details for the file dsch-0.3.2-py3-none-any.whl.

File metadata

Download URL: dsch-0.3.2-py3-none-any.whl
Upload date: Jun 25, 2024
Size: 35.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.9

File hashes

Hashes for dsch-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6bd8ed9f1be393474bbba465c4538b8ab3bfde435b76dcbddb0bbd89f2a039f1`
MD5	`ccd01c24a77d62c9cc4d7cd542654bf3`
BLAKE2b-256	`3f4384349457862315adeeb034c2cfa1a50c6eb23694014157b1be6a5c11e64c`

See more details on using hashes here.

dsch 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dsch

Introduction

Reasoning

Changelog

0.3.2 - 2024-06-25

0.3.1 - 2024-06-25

0.3.0 - 2021-02-12

Added

Changed

Fixed

0.2.1 - 2018-02-02

Changed

Fixed

0.2.0 - 2018-02-01

Added

Changed

0.1.3 - 2018-01-11

Changed

Fixed

0.1.2 - 2017-08-25

0.1.1 - 2017-06-09

Added

Fixed

0.1.0 - 2017-05-18

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes