MAGDI data
Project description
MAGDI Data
This python package named magdi_data is responsible for data loading to be used for
AI training.
It also contains the definition of a specific dataset structure.
Setup
pip install -r requirements.txtto run AI experimentspip install -r requirements-tests.txtto run pytest- Copy
.env-exampleas.envfile and set your credentials and other env variables there
dataset_meta.json
The dataset_meta.json describes a dataset in the Occurrence Instance Format. Its properties have to follow a strict format to be readable by humans and machines.
How to create a dataset
Read README_create_occ_inst_dataset.md
Terms
Occurrence
- An occurrence is a real scanned object. It can have one or more instances (replicates or measurements).
Instance
- An instance is a digitalized object.
Contents
| property | description | json type | example |
|---|---|---|---|
| id | Identifier of the Dataset. Is to be used as folder name. Normalized string costing of "name", "release" and "uuid". Is automatically overwritten by each validation by pydantic model. | string | Maize |
| uuid | Generated UUID string for this dataset. | ||
| name | Name of the dataset. | ||
| short_name | Abbreviation of the "name". | ||
| release | Version of release. | ||
| descriptive_metadata | Nested object that holds further optional fields to describe the dataset. | ||
| entity | Type of real-world object the dataset pertains. | string | maize |
| string | 2025.07.02 | ||
| description | Short description of the dataset. | string | NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded. |
| reference | Description or URL where the data was published. | string | Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) |
| license | Name which license is used for this dataset. | string | IPK proprietary |
| tags | List of modalities to describe this dataset. On Hugging Face an overview of exsisting modalities can be found: https://huggingface.co/datasets | array of string | ["MRI","3D","Image"] |
| task_categories | List of Machine Learning tasks for which the dataset is intended. Hugging Face gives an overview of possible tasks: https://huggingface.co/tasks | array of string | ["image-segmentation"] |
| labels | OPTIONAL (Needed if labeled data exists). Class label mapping made of key value pairs. Keys must be ascending positive integers starting from zero and are formatted as strings. They cannot have gaps. If a key has not a value, the value must be set as empty string e.g. "1": "". |
object | { "0": "background", "1": "embryo" "2":"endosperm", "3":"aleuron"} |
| image_format | File ending of the image files. | string | .nii.gz |
| annotation_format | OPTIONAL (Needed if labeled data exists). File ending of the annotation files. | string | .nii.gz |
| num_instances_total | Total number of instances. | integer | 4 |
| num_instances_training | Number of instances of the training split. | integer | 2 |
| num_instances_test | OPTIONAL (Needed if test split exists). Number of instances of the test split. | integer | 2 |
| num_instances_validation | OPTIONAL (Needed if validation split exists). Number of instances of the validation split. | integer | 2 |
| num_occurrences_total | Total number of occurrences. | integer | 4 |
| num_occurrences_training | Number of occurrences of the training split. | integer | 2 |
| num_occurrences_test | OPTIONAL (Needed if test split exists). Number of occurrences of the test split. | integer | 2 |
| num_occurrences_validation | OPTIONAL (Needed if validation split exists). Number of occurrences of the validation split. | integer | 2 |
| image_data_type | Defines the type of data compatible to numpy (np.dtype) how the image information is stored. | string | int16 |
| annotation_data_type | OPTIONAL (Needed if labeled data exists). Defines the type of data compatible to numpy (np.dtype) how the annotation information is stored. | string | uint8 |
| value_channels | Number of channels how the image information is stored, e.g. 1 for MRI, 3 for RGB images. | integer | 1 |
| range | Possible value range of voxels across all instances for the entity. This field is set manually. | ||
| range_max | Maximum value range across all instances. This field is calculated during validation by pydantic model. | ||
| range_min | Minimum value range across all instances. This field is calculated during validation by pydantic model. | ||
| range_avg | Average value range across all instances. This field is calculated during validation by pydantic model. | ||
| dimensions_max | Dimensions of the largest image. | array of integer | [96, 128, 150] |
| dimensions_min | Dimensions of the smallest image. Must have the same shape as dimensionsMax. | array of integer | [41, 64, 78] |
| dimensions_avg | Average of the dimensions over all images. Must have the same shape as dimensionsMax. | array of number | [57.49618320610687, 92.01526717557252, 115.6030534351145] |
| resolution_unit | Unit for resolution_voxel_size. Voxel size: relative to the real world for each image dimension | string | mm |
| resolution_voxel_size | OPTIONAL Size of a voxel for each instance. This field is set manually. | array of number | [0.1,0.1,0.1] |
| resolution_voxel_size_max | Maximum voxel size across all instances. This field is calculated during validation by pydantic model. | ||
| resolution_voxel_size_min | Minimum voxel size across all instances. This field is calculated during validation by pydantic model. | ||
| resolution_voxel_size_avg | Average voxel size across all instances. This field is calculated during validation by pydantic model. | ||
| training | Training split. Contains one json object for each instance of the training split. Each json object holds the path to the image file. For annotated data the path to the annotation file is set. "image" key is mandatory. "annotations" key is only mandatory for annotated data. |
array of object | [ { "annotations": "occurrence-0000/instance-0000/annotations.nii.gz", "image": "occurrence-0001/instance-0000/image.nii.gz" } ] |
| test | OPTIONAL (Needed if test split exists). Test split. Contains one json object for each instance of the test split. Each json object holds the path to the image file. For annotated data the path to the annotation file is set. "image" key is mandatory. "annotations" key is only needed for annotated data. |
array of object | [ { "annotations": "occurrence-0004/instance-0000/annotations.nii.gz", "image": "occurrence-0005/instance-0000/image.nii.gz" } ] |
| validation | OPTIONAL (Needed if validation split exists). Validation split. Contains one json object for each instance of the validation split. Each json object holds the path to the image file. For annotated data the path to the annotation file is set. "image" key is mandatory. "annotations" key is only needed for annotated data. |
array of object | [ { "annotations": "occurrence-0002/instance-0000/annotations.nii.gz", "image": "occurrence-0003/instance-0000/image.nii.gz" } ] |
| data_references | Holds key-value pairs to map directory or files of this dataset to a reference. It is not used for AI training. It is necessary to associate digital instances with its origins of creation. E.g. Occurrence is mapped to the ID of a real world object. |
object | { "occurrence-0000/instance-0000/annotations.nii.gz": "example1", "occurrence-0001/instance-0000/annotations.nii.gz": "example3" } |
Example:
{
"id": "mais_karyopse_2025-07-02_806fd055-b90d-46d5-a1ce-dd7e61b286ee",
"uuid": "806fd055-b90d-46d5-a1ce-dd7e61b286ee",
"name": "Mais Karyopse",
"short_name": "MK2025",
"entity": "maize",
"descriptive_metadata": {
"latin_name": "Frumentum",
"line_name": "Mais ",
"structure": "seed",
"dap": "10 DAP",
"device": "NMR Device Name",
"coil": "CRP 13C/1H 5mm 400MHz",
"measurement_channel": "structure"
},
"release": "2021.7.5",
"description": "Short description",
"reference": "Reference to teh author",
"license": "license",
"tags": [
"MRI",
"3D",
"Image"
],
"task_categories": [
"image-segmentation"
],
"labels": {
"0": "background",
"1": "embryo",
"2": "endosperm",
"3": "aleuron"
},
"image_format": ".nii.gz",
"annotation_format": ".nii.gz",
"num_instances_total": 6,
"num_instances_training": 2,
"num_instances_test": 2,
"num_instances_validation": 2,
"num_occurrences_total": 6,
"num_occurrences_training": 2,
"num_occurrences_test": 2,
"num_occurrences_validation": 2,
"image_data_type": "int16",
"annotation_data_type": "uint8",
"value_channels": 1,
"range": [
0,
16000
],
"range_max": 1002,
"range_min": 0,
"range_avg": 167.57250248655913,
"dimensions_max": [
96,
128,
150
],
"dimensions_min": [
41,
64,
78
],
"dimensions_avg": [
57.49618320610687,
92.01526717557252,
115.6030534351145
],
"resolution_unit": "mm",
"resolution_voxel_size": [
0.1,
0.1,
0.1
],
"resolution_voxel_size_max": [
0.1,
0.1,
0.1
],
"resolution_voxel_size_min": [
0.1,
0.1,
0.1
],
"resolution_voxel_size_avg": [
0.1,
0.1,
0.1
],
"training": [
{
"annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
"image": "occurrence-0000/instance-0000/image.nii.gz"
},
{
"annotations": "occurrence-0001/instance-0000/annotations.nii.gz",
"image": "occurrence-0001/instance-0000/image.nii.gz"
}
],
"validation": [
{
"annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
"image": "occurrence-0002/instance-0000/image.nii.gz"
},
{
"annotations": "occurrence-0003/instance-0000/annotations.nii.gz",
"image": "occurrence-0003/instance-0000/image.nii.gz"
}
],
"test": [
{
"annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
"image": "occurrence-0004/instance-0000/image.nii.gz"
},
{
"annotations": "occurrence-0005/instance-0000/annotations.nii.gz",
"image": "occurrence-0005/instance-0000/image.nii.gz"
}
],
"data_references": {
"occurrence-0000/instance-0000/image.nii.gz": "example1",
"occurrence-0001/instance-0000/image.nii.gz": "example2",
"occurrence-0002/instance-0000/image.nii.gz": "example3",
"occurrence-0003/instance-0000/image.nii.gz": "folder_xyz/exampleA",
"occurrence-0004/instance-0000/image.nii.gz": "folder_xyz/exampleB",
"occurrence-0005/instance-0000/image.nii.gz": "folder_xyz/exampleC"
}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file magdi_data-0.5.0a260.tar.gz.
File metadata
- Download URL: magdi_data-0.5.0a260.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f72295edc01b5834d2ce4d4b938375478bc51c191f70a83f0e9f59879db6359b
|
|
| MD5 |
448132e414882195e88672baecc31cc7
|
|
| BLAKE2b-256 |
0845ff0d7afaa9693bea90f44982223a86ad5046390588c4d0e3aa1781bc3914
|
File details
Details for the file magdi_data-0.5.0a260-py3-none-any.whl.
File metadata
- Download URL: magdi_data-0.5.0a260-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a77dda65e216162a3662736efddb94f9375e64bf91b706ec414ebae870840cee
|
|
| MD5 |
507eeaf851317c38308e7e9988d6d3d8
|
|
| BLAKE2b-256 |
2296d56f8fff8167634a47acd139e6c2ab152dfa841df4d3033e2de73afd4ea0
|