MAGDI data

These details have not been verified by PyPI

Project links

Gitlab

Operating System
- OS Independent
Programming Language

Project description

MAGDI Data

This python package named magdi_data is responsible for data loading to be used for AI training. It also contains the definition of a specific dataset structure.

Setup

pip install -r requirements.txt to run AI experiments
pip install -r requirements-tests.txt to run pytest
Copy .env-example as .env file and set your credentials and other env variables there

dataset_meta.json

The dataset_meta.json describes a dataset in the Occurrence Instance Format. Its properties have to follow a strict format to be readable by humans and machines.

How to create a dataset

Read README_create_occ_inst_dataset.md

Terms

Occurrence

An occurrence is a real scanned object. It can have one or more instances (replicates or measurements).

Instance

An instance is a digitalized object.

property	description	json type	example
id	Identifier of the Dataset. Is to be used as folder name. Normalized string costing of "name", "release" and "uuid". Is automatically overwritten by each validation by pydantic model.	string	Maize
uuid	Generated UUID string for this dataset.
name	Name of the dataset.
short_name	Abbreviation of the "name".
release	Version of release.
descriptive_metadata	Nested object that holds further optional fields to describe the dataset.
entity	Type of real-world object the dataset pertains.	string	maize
string	2025.07.02
description	Short description of the dataset.	string	NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded.
reference	Description or URL where the data was published.	string	Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)
license	Name which license is used for this dataset.	string	IPK proprietary
tags	List of modalities to describe this dataset. On Hugging Face an overview of exsisting modalities can be found: https://huggingface.co/datasets	array of string	["MRI","3D","Image"]
task_categories	List of Machine Learning tasks for which the dataset is intended. Hugging Face gives an overview of possible tasks: https://huggingface.co/tasks	array of string	["image-segmentation"]
labels	OPTIONAL (Needed if labeled data exists). Class label mapping made of key value pairs. Keys must be ascending positive integers starting from zero and are formatted as strings. They cannot have gaps. If a key has not a value, the value must be set as empty string e.g. "1": "".	object	{ "0": "background", "1": "embryo" "2":"endosperm", "3":"aleuron"}
image_format	File ending of the image files.	string	.nii.gz
annotation_format	OPTIONAL (Needed if labeled data exists). File ending of the annotation files.	string	.nii.gz
num_instances_total	Total number of instances.	integer	4
num_instances_training	Number of instances of the training split.	integer	2
num_instances_test	OPTIONAL (Needed if test split exists). Number of instances of the test split.	integer	2
num_instances_validation	OPTIONAL (Needed if validation split exists). Number of instances of the validation split.	integer	2
num_occurrences_total	Total number of occurrences.	integer	4
num_occurrences_training	Number of occurrences of the training split.	integer	2
num_occurrences_test	OPTIONAL (Needed if test split exists). Number of occurrences of the test split.	integer	2
num_occurrences_validation	OPTIONAL (Needed if validation split exists). Number of occurrences of the validation split.	integer	2
image_data_type	Defines the type of data compatible to numpy (np.dtype) how the image information is stored.	string	int16
annotation_data_type	OPTIONAL (Needed if labeled data exists). Defines the type of data compatible to numpy (np.dtype) how the annotation information is stored.	string	uint8
value_channels	Number of channels how the image information is stored, e.g. 1 for MRI, 3 for RGB images.	integer	1
range	Possible value range of voxels across all instances for the entity. This field is set manually.
range_max	Maximum value range across all instances. This field is calculated during validation by pydantic model.
range_min	Minimum value range across all instances. This field is calculated during validation by pydantic model.
range_avg	Average value range across all instances. This field is calculated during validation by pydantic model.
dimensions_max	Dimensions of the largest image.	array of integer	[96, 128, 150]
dimensions_min	Dimensions of the smallest image. Must have the same shape as dimensionsMax.	array of integer	[41, 64, 78]
dimensions_avg	Average of the dimensions over all images. Must have the same shape as dimensionsMax.	array of number	[57.49618320610687, 92.01526717557252, 115.6030534351145]
resolution_unit	Unit for resolution_voxel_size. Voxel size: relative to the real world for each image dimension	string	mm
resolution_voxel_size	OPTIONAL Size of a voxel for each instance. This field is set manually.	array of number	[0.1,0.1,0.1]
resolution_voxel_size_max	Maximum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_min	Minimum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_avg	Average voxel size across all instances. This field is calculated during validation by pydantic model.
training	Training split. Contains one json object for each instance of the training split. Each json object holds the path to the image file. For annotated data the path to the annotation file is set. "image" key is mandatory. "annotations" key is only mandatory for annotated data.	array of object	[ { "annotations": "occurrence-0000/instance-0000/annotations.nii.gz", "image": "occurrence-0001/instance-0000/image.nii.gz" } ]
test	OPTIONAL (Needed if test split exists). Test split. Contains one json object for each instance of the test split. Each json object holds the path to the image file. For annotated data the path to the annotation file is set. "image" key is mandatory. "annotations" key is only needed for annotated data.	array of object	[ { "annotations": "occurrence-0004/instance-0000/annotations.nii.gz", "image": "occurrence-0005/instance-0000/image.nii.gz" } ]
validation	OPTIONAL (Needed if validation split exists). Validation split. Contains one json object for each instance of the validation split. Each json object holds the path to the image file. For annotated data the path to the annotation file is set. "image" key is mandatory. "annotations" key is only needed for annotated data.	array of object	[ { "annotations": "occurrence-0002/instance-0000/annotations.nii.gz", "image": "occurrence-0003/instance-0000/image.nii.gz" } ]
data_references	Holds key-value pairs to map directory or files of this dataset to a reference. It is not used for AI training. It is necessary to associate digital instances with its origins of creation. E.g. Occurrence is mapped to the ID of a real world object.	object	{ "occurrence-0000/instance-0000/annotations.nii.gz": "example1", "occurrence-0001/instance-0000/annotations.nii.gz": "example3" }

Example:

{
  "id": "mais_karyopse_2025-07-02_806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "uuid": "806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "name": "Mais Karyopse",
  "short_name": "MK2025",
  "entity": "maize",
  "descriptive_metadata": {
    "latin_name": "Frumentum",
    "line_name": "Mais ",
    "structure": "seed",
    "dap": "10 DAP",
    "device": "NMR Device Name",
    "coil": "CRP 13C/1H 5mm 400MHz",
    "measurement_channel": "structure"
  },
  "release": "2021.7.5",
  "description": "Short description",
  "reference": "Reference to teh author",
  "license": "license",
  "tags": [
    "MRI",
    "3D",
    "Image"
  ],
  "task_categories": [
    "image-segmentation"
  ],
  "labels": {
    "0": "background",
    "1": "embryo",
    "2": "endosperm",
    "3": "aleuron"
  },
  "image_format": ".nii.gz",
  "annotation_format": ".nii.gz",
  "num_instances_total": 6,
  "num_instances_training": 2,
  "num_instances_test": 2,
  "num_instances_validation": 2,
  "num_occurrences_total": 6,
  "num_occurrences_training": 2,
  "num_occurrences_test": 2,
  "num_occurrences_validation": 2,
  "image_data_type": "int16",
  "annotation_data_type": "uint8",
  "value_channels": 1,
  "range": [
    0,
    16000
  ],
  "range_max": 1002,
  "range_min": 0,
  "range_avg": 167.57250248655913,
  "dimensions_max": [
    96,
    128,
    150
  ],
  "dimensions_min": [
    41,
    64,
    78
  ],
  "dimensions_avg": [
    57.49618320610687,
    92.01526717557252,
    115.6030534351145
  ],
  "resolution_unit": "mm",
  "resolution_voxel_size": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_max": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_min": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_avg": [
    0.1,
    0.1,
    0.1
  ],
  "training": [
    {
      "annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
      "image": "occurrence-0000/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0001/instance-0000/annotations.nii.gz",
      "image": "occurrence-0001/instance-0000/image.nii.gz"
    }
  ],
  "validation": [
    {
      "annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
      "image": "occurrence-0002/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0003/instance-0000/annotations.nii.gz",
      "image": "occurrence-0003/instance-0000/image.nii.gz"
    }
  ],
  "test": [
    {
      "annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
      "image": "occurrence-0004/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0005/instance-0000/annotations.nii.gz",
      "image": "occurrence-0005/instance-0000/image.nii.gz"
    }
  ],
  "data_references": {
    "occurrence-0000/instance-0000/image.nii.gz": "example1",
    "occurrence-0001/instance-0000/image.nii.gz": "example2",
    "occurrence-0002/instance-0000/image.nii.gz": "example3",
    "occurrence-0003/instance-0000/image.nii.gz": "folder_xyz/exampleA",
    "occurrence-0004/instance-0000/image.nii.gz": "folder_xyz/exampleB",
    "occurrence-0005/instance-0000/image.nii.gz": "folder_xyz/exampleC"
  }
}

Project details

These details have not been verified by PyPI

Project links

Gitlab

Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.5.0a260 pre-release

May 5, 2026

0.5.0a257 pre-release

May 5, 2026

0.5.0a255 pre-release

May 5, 2026

0.5.0a253 pre-release

May 5, 2026

0.5.0a251 pre-release

Apr 24, 2026

0.5.0a249 pre-release

Apr 23, 2026

0.5.0a247 pre-release

Apr 22, 2026

0.5.0a245 pre-release

Apr 22, 2026

0.5.0a243 pre-release

Apr 22, 2026

0.5.0a241 pre-release

Apr 21, 2026

0.5.0a239 pre-release

Apr 21, 2026

0.5.0a237 pre-release

Apr 20, 2026

0.5.0a235 pre-release

Apr 20, 2026

0.5.0a233 pre-release

Apr 20, 2026

0.5.0a231 pre-release

Apr 20, 2026

0.5.0a228 pre-release

Apr 20, 2026

0.5.0a219 pre-release

Apr 8, 2026

0.5.0a218 pre-release

Apr 8, 2026

0.5.0a209 pre-release

Feb 27, 2026

0.5.0a207 pre-release

Feb 27, 2026

0.5.0a199 pre-release

Feb 13, 2026

0.4.1a1 pre-release

Apr 20, 2026

0.4.1a0 pre-release

Apr 20, 2026

0.4.0

Feb 13, 2026

0.4.0a198 pre-release

Feb 13, 2026

0.4.0a196 pre-release

Feb 13, 2026

0.4.0a194 pre-release

Feb 12, 2026

0.4.0a174 pre-release

Jan 28, 2026

0.3.1

Jan 28, 2026

0.3.0

Jan 22, 2026

0.3.0a169 pre-release

Jan 22, 2026

0.3.0a168 pre-release

Jan 22, 2026

0.3.0a163 pre-release

Jan 15, 2026

0.2.0

Jan 14, 2026

0.2.0a158 pre-release

Jan 14, 2026

0.1.6

Jan 12, 2026

0.1.5

Jan 9, 2026

0.1.0a153 pre-release

Jan 12, 2026

0.1.0a145 pre-release

Jan 9, 2026

0.0.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magdi_data-0.5.0a260.tar.gz (21.6 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

magdi_data-0.5.0a260-py3-none-any.whl (23.2 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file magdi_data-0.5.0a260.tar.gz.

File metadata

Download URL: magdi_data-0.5.0a260.tar.gz
Upload date: May 5, 2026
Size: 21.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.5.0a260.tar.gz
Algorithm	Hash digest
SHA256	`f72295edc01b5834d2ce4d4b938375478bc51c191f70a83f0e9f59879db6359b`
MD5	`448132e414882195e88672baecc31cc7`
BLAKE2b-256	`0845ff0d7afaa9693bea90f44982223a86ad5046390588c4d0e3aa1781bc3914`

See more details on using hashes here.

File details

Details for the file magdi_data-0.5.0a260-py3-none-any.whl.

File metadata

Download URL: magdi_data-0.5.0a260-py3-none-any.whl
Upload date: May 5, 2026
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.5.0a260-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a77dda65e216162a3662736efddb94f9375e64bf91b706ec414ebae870840cee`
MD5	`507eeaf851317c38308e7e9988d6d3d8`
BLAKE2b-256	`2296d56f8fff8167634a47acd139e6c2ab152dfa841df4d3033e2de73afd4ea0`

See more details on using hashes here.

magdi-data 0.5.0a260

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MAGDI Data

Setup

dataset_meta.json

How to create a dataset

Terms

Occurrence

Instance

Contents

Example:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes