Skip to main content

Datalad Metadata Model

Project description

Build status codecov PyPI version GitHub release (latest by date including pre-releases)

Datalad Metadata Model

This software implements the metadata model that datalad and datalad-metalad (from version 0.3.0) use to store metadata.

Model Elements (the model layer)

The metadata model is defined by the API of the top-level classes. Those are:

  • MetadataRootRecord -- holds top-level metadata information for a single version of a datalad dataset

  • UUIDSet -- holds metadata root records for a set of datasets that are identified by their UUIDs and their version.

  • TreeVersionList -- holds metadata root records and a sub-dataset tree for a dataset version and its sub-datasets

  • Metadata -- represents metadata for a single item, i.e. dataset or file. Metadata is associated with extractor names and extraction parameters.

  • DatasetTree -- a representation of the sub-dataset hierarchy of a dataset

  • FileTree -- a representation of the file-tree of a dataset

  • ...

Because of the large size of some datalad-datasets, e.g. tens of thousands of sub-datasets and hundres of millions of files, the implementation allows focus-based operations on individual parts of the potentially very large metadata model. The implementation uses the proxy-pattern, that means, it loads, modifies, and saves only the minimal necessary model elements that are necessary to operate on the metadata-information that the user is interested in.

Storage layer

The model elements have to be persisted on a storage backend. How the model is mapped on storage backends is defined by the storage layer, that is to a large degree independent of the model layer. The intention is to support multiple storage backends in the past.

Currently, only one storage backend is supported:

  • git-mapping -- a storage backend that stores a metadata model in a git repository. The model objects are stored outside of existing branches. They are referenced by datalad-specific git-references under refs/datalad/*

Acknowledgements

This DataLad extension was developed with support from the German Federal Ministry of Education and Research (BMBF 01GQ1905), and the US National Science Foundation (NSF 1912266).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad-metadata-model-0.3.11.tar.gz (69.0 kB view hashes)

Uploaded Source

Built Distribution

datalad_metadata_model-0.3.11-py3-none-any.whl (80.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page