Skip to main content

MAGDI data

Project description

MAGDI Data

This python package named magdi_data is responsible for data loading to be used for AI training. It also contains the definition of a specific dataset structure.

Setup

  • pip install -r requirements.txt to run AI experiments
  • pip install -r requirements-tests.txt to run pytest
  • Copy .env-example as .env file and set your credentials and other env variables there

dataset_meta.json

The dataset_meta.json describes a dataset in the Occurrence Instance Format. Its properties have to follow a strict format to be readable by humans and machines.

How to create a dataset

Read README_create_occ_inst_dataset.md

Terms

Occurrence

  • An occurrence is a real scanned object. It can have one or more instances (replicates or measurements).

Instance

  • An instance is a digitalized object.

Contents

property description json type example
id Identifier of the Dataset. Is to be used as folder name. Normalized string costing of "name", "release" and "uuid". Is automatically overwritten by each validation by pydantic model. string Maize
uuid Generated UUID string for this dataset.
name Name of the dataset.
short_name Abbreviation of the "name".
release Version of release.
descriptive_metadata Nested object that holds further optional fields to describe the dataset.
entity Type of real-world object the dataset pertains. string maize
string 2025.07.02
description Short description of the dataset. string NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded.
reference Description or URL where the data was published. string Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)
license Name which license is used for this dataset. string IPK proprietary
tags List of modalities to describe this dataset. On Hugging Face an overview of exsisting modalities can be found: https://huggingface.co/datasets array of string ["MRI","3D","Image"]
task_categories List of Machine Learning tasks for which the dataset is intended. Hugging Face gives an overview of possible tasks: https://huggingface.co/tasks array of string ["image-segmentation"]
labels OPTIONAL (Needed if labeled data exists). Class label mapping made of key value pairs. Keys must be ascending positive integers starting from zero and are formatted as strings.
They cannot have gaps. If a key has not a value, the value must be set as empty string e.g. "1": "".
object {
"0": "background",
"1": "embryo"
"2":"endosperm",
"3":"aleuron"}
image_format File ending of the image files. string .nii.gz
annotation_format OPTIONAL (Needed if labeled data exists). File ending of the annotation files. string .nii.gz
num_instances_total Total number of instances. integer 4
num_instances_training Number of instances of the training split. integer 2
num_instances_test OPTIONAL (Needed if test split exists). Number of instances of the test split. integer 2
num_instances_validation OPTIONAL (Needed if validation split exists). Number of instances of the validation split. integer 2
num_occurrences_total Total number of occurrences. integer 4
num_occurrences_training Number of occurrences of the training split. integer 2
num_occurrences_test OPTIONAL (Needed if test split exists). Number of occurrences of the test split. integer 2
num_occurrences_validation OPTIONAL (Needed if validation split exists). Number of occurrences of the validation split. integer 2
image_data_type Defines the type of data compatible to numpy (np.dtype) how the image information is stored. string int16
annotation_data_type OPTIONAL (Needed if labeled data exists). Defines the type of data compatible to numpy (np.dtype) how the annotation information is stored. string uint8
value_channels Number of channels how the image information is stored, e.g. 1 for MRI, 3 for RGB images. integer 1
range Possible value range of voxels across all instances for the entity. This field is set manually.
range_max Maximum value range across all instances. This field is calculated during validation by pydantic model.
range_min Minimum value range across all instances. This field is calculated during validation by pydantic model.
range_avg Average value range across all instances. This field is calculated during validation by pydantic model.
dimensions_max Dimensions of the largest image. array of integer [96, 128, 150]
dimensions_min Dimensions of the smallest image. Must have the same shape as dimensionsMax. array of integer [41, 64, 78]
dimensions_avg Average of the dimensions over all images. Must have the same shape as dimensionsMax. array of number [57.49618320610687, 92.01526717557252, 115.6030534351145]
resolution_unit Unit for resolution_voxel_size. Voxel size: relative to the real world for each image dimension string mm
resolution_voxel_size OPTIONAL Size of a voxel for each instance. This field is set manually. array of number [0.1,0.1,0.1]
resolution_voxel_size_max Maximum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_min Minimum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_avg Average voxel size across all instances. This field is calculated during validation by pydantic model.
training Training split.
Contains one json object for each instance of the training split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set. "image" key is mandatory.
"annotations" key is only mandatory for annotated data.
array of object [
{
"annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
"image": "occurrence-0001/instance-0000/image.nii.gz"
}
]
test OPTIONAL (Needed if test split exists). Test split.
Contains one json object for each instance of the test split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
"image": "occurrence-0005/instance-0000/image.nii.gz"
}
]
validation OPTIONAL (Needed if validation split exists). Validation split.
Contains one json object for each instance of the validation split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
"image": "occurrence-0003/instance-0000/image.nii.gz"
}
]
data_references Holds key-value pairs to map directory or files of this dataset to a reference.
It is not used for AI training. It is necessary to associate digital instances with its origins of creation.
E.g. Occurrence is mapped to the ID of a real world object.
object {
"occurrence-0000/instance-0000/annotations.nii.gz": "example1",
"occurrence-0001/instance-0000/annotations.nii.gz": "example3"
}

Example:

{
  "id": "mais_karyopse_2025-07-02_806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "uuid": "806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "name": "Mais Karyopse",
  "short_name": "MK2025",
  "entity": "maize",
  "descriptive_metadata": {
    "latin_name": "Frumentum",
    "line_name": "Mais ",
    "structure": "seed",
    "dap": "10 DAP",
    "device": "NMR Device Name",
    "coil": "CRP 13C/1H 5mm 400MHz",
    "measurement_channel": "structure"
  },
  "release": "2021.7.5",
  "description": "Short description",
  "reference": "Reference to teh author",
  "license": "license",
  "tags": [
    "MRI",
    "3D",
    "Image"
  ],
  "task_categories": [
    "image-segmentation"
  ],
  "labels": {
    "0": "background",
    "1": "embryo",
    "2": "endosperm",
    "3": "aleuron"
  },
  "image_format": ".nii.gz",
  "annotation_format": ".nii.gz",
  "num_instances_total": 6,
  "num_instances_training": 2,
  "num_instances_test": 2,
  "num_instances_validation": 2,
  "num_occurrences_total": 6,
  "num_occurrences_training": 2,
  "num_occurrences_test": 2,
  "num_occurrences_validation": 2,
  "image_data_type": "int16",
  "annotation_data_type": "uint8",
  "value_channels": 1,
  "range": [
    0,
    16000
  ],
  "range_max": 1002,
  "range_min": 0,
  "range_avg": 167.57250248655913,
  "dimensions_max": [
    96,
    128,
    150
  ],
  "dimensions_min": [
    41,
    64,
    78
  ],
  "dimensions_avg": [
    57.49618320610687,
    92.01526717557252,
    115.6030534351145
  ],
  "resolution_unit": "mm",
  "resolution_voxel_size": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_max": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_min": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_avg": [
    0.1,
    0.1,
    0.1
  ],
  "training": [
    {
      "annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
      "image": "occurrence-0000/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0001/instance-0000/annotations.nii.gz",
      "image": "occurrence-0001/instance-0000/image.nii.gz"
    }
  ],
  "validation": [
    {
      "annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
      "image": "occurrence-0002/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0003/instance-0000/annotations.nii.gz",
      "image": "occurrence-0003/instance-0000/image.nii.gz"
    }
  ],
  "test": [
    {
      "annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
      "image": "occurrence-0004/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0005/instance-0000/annotations.nii.gz",
      "image": "occurrence-0005/instance-0000/image.nii.gz"
    }
  ],
  "data_references": {
    "occurrence-0000/instance-0000/image.nii.gz": "example1",
    "occurrence-0001/instance-0000/image.nii.gz": "example2",
    "occurrence-0002/instance-0000/image.nii.gz": "example3",
    "occurrence-0003/instance-0000/image.nii.gz": "folder_xyz/exampleA",
    "occurrence-0004/instance-0000/image.nii.gz": "folder_xyz/exampleB",
    "occurrence-0005/instance-0000/image.nii.gz": "folder_xyz/exampleC"
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magdi_data-0.5.0a218.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magdi_data-0.5.0a218-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file magdi_data-0.5.0a218.tar.gz.

File metadata

  • Download URL: magdi_data-0.5.0a218.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for magdi_data-0.5.0a218.tar.gz
Algorithm Hash digest
SHA256 d414a2125b79957a1e6f8ffe18a2fa4b974cf01a2acefd98353c09cc09374ff4
MD5 e4f65c2b37e380556cacf4662a932344
BLAKE2b-256 f9f6a98a3402fef366a2a2242185d3e22d6735283265bccb2fa7fd8744f4b237

See more details on using hashes here.

File details

Details for the file magdi_data-0.5.0a218-py3-none-any.whl.

File metadata

  • Download URL: magdi_data-0.5.0a218-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for magdi_data-0.5.0a218-py3-none-any.whl
Algorithm Hash digest
SHA256 2933905054daf9e877b4eff08e0cf7454ec961190cf72f08f6d6cca9a92e7041
MD5 9ec7f4167e9900a343a39e56d0c13d6b
BLAKE2b-256 a7959065080acbc62730115fa138e8400d112e43fad4901934fb7337af90173d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page