Skip to main content

MAGDI data

Project description

MAGDI Data

This python package named magdi_data is responsible for data loading to be used for AI training. It also contains the definition of a specific dataset structure.

dataset_meta.json

The dataset_meta.json describes a dataset in the Occurrence Instance Format. Its properties have to follow a strict format to be readable by humans and machines.

Terms

Occurrence

  • An occurrence is a real scanned object. It can have one or more instances (replicates or measurements).

Instance

  • An instance is a digitalized object.

Contents

property description json type example
id Identifier of the Dataset. Is to be used as folder name. Normalized string costing of "name", "release" and "uuid". Is automatically overwritten by each validation by pydantic model. string Maize
uuid Generated UUID string for this dataset.
name Name of the dataset.
short_name Abbreviation of the "name".
release Version of release.
descriptive_metadata Nested object that holds further optional fields to describe the dataset.
entity Type of real-world object the dataset pertains. string maize
string 2025.07.02
description Short description of the dataset. string NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded.
reference Description or URL where the data was published. string Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)
license Name which license is used for this dataset. string IPK proprietary
tags List of modalities to describe this dataset. On Hugging Face an overview of exsisting modalities can be found: https://huggingface.co/datasets array of string ["MRI","3D","Image"]
task_categories List of Machine Learning tasks for which the dataset is intended. Hugging Face gives an overview of possible tasks: https://huggingface.co/tasks array of string ["image-segmentation"]
labels OPTIONAL (Needed if labeled data exists). Class label mapping made of key value pairs. Keys must be ascending positive integers starting from zero and are formatted as strings.
They cannot have gaps. If a key has not a value, the value must be set as empty string e.g. "1": "".
object {
"0": "background",
"1": "embryo"
"2":"endosperm",
"3":"aleuron"}
image_format File ending of the image files. string .nii.gz
annotation_format OPTIONAL (Needed if labeled data exists). File ending of the annotation files. string .nii.gz
num_instances_total Total number of instances. integer 4
num_instances_training Number of instances of the training split. integer 2
num_instances_test OPTIONAL (Needed if test split exists). Number of instances of the test split. integer 2
num_instances_validation OPTIONAL (Needed if validation split exists). Number of instances of the validation split. integer 2
num_occurrences_total Total number of occurrences. integer 4
num_occurrences_training Number of occurrences of the training split. integer 2
num_occurrences_test OPTIONAL (Needed if test split exists). Number of occurrences of the test split. integer 2
num_occurrences_validation OPTIONAL (Needed if validation split exists). Number of occurrences of the validation split. integer 2
image_data_type Defines the type of data compatible to numpy (np.dtype) how the image information is stored. string int16
annotation_data_type OPTIONAL (Needed if labeled data exists). Defines the type of data compatible to numpy (np.dtype) how the annotation information is stored. string uint8
value_channels Number of channels how the image information is stored, e.g. 1 for MRI, 3 for RGB images. integer 1
range Possible value range of voxels across all instances for the entity. This field is set manually.
range_max Maximum value range across all instances. This field is calculated during validation by pydantic model.
range_min Minimum value range across all instances. This field is calculated during validation by pydantic model.
range_avg Average value range across all instances. This field is calculated during validation by pydantic model.
dimensions_max Dimensions of the largest image. array of integer [96, 128, 150]
dimensions_min Dimensions of the smallest image. Must have the same shape as dimensionsMax. array of integer [41, 64, 78]
dimensions_avg Average of the dimensions over all images. Must have the same shape as dimensionsMax. array of number [57.49618320610687, 92.01526717557252, 115.6030534351145]
resolution_unit Unit for resolution_voxel_size. Voxel size: relative to the real world for each image dimension string mm
resolution_voxel_size Size of a voxel for each instance. This field is set manually. array of number [0.1,0.1,0.1]
resolution_voxel_size_max Maximum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_min Minimum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_avg Average voxel size across all instances. This field is calculated during validation by pydantic model.
training Training split.
Contains one json object for each instance of the training split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set. "image" key is mandatory.
"annotations" key is only mandatory for annotated data.
array of object [
{
"annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
"image": "occurrence-0001/instance-0000/image.nii.gz"
}
]
test OPTIONAL (Needed if test split exists). Test split.
Contains one json object for each instance of the test split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
"image": "occurrence-0005/instance-0000/image.nii.gz"
}
]
validation OPTIONAL (Needed if validation split exists). Validation split.
Contains one json object for each instance of the validation split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
"image": "occurrence-0003/instance-0000/image.nii.gz"
}
]
data_references Holds key-value pairs to map directory or files of this dataset to a reference.
It is not used for AI training. It is necessary to associate digital instances with its origins of creation.
E.g. Occurrence is mapped to the ID of a real world object.
object {
"occurrence-0000/instance-0000/annotations.nii.gz": "example1",
"occurrence-0001/instance-0000/annotations.nii.gz": "example3"
}

Example:

{
  "id": "mais_karyopse_2025-07-02_806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "uuid": "806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "name": "Mais Karyopse",
  "short_name": "MK2025",
  "entity": "maize",
  "descriptive_metadata": {
    "latin_name": "Frumentum",
    "line_name": "Mais ",
    "structure": "seed",
    "dap": "10 DAP",
    "device": "NMR Device Name",
    "coil": "CRP 13C/1H 5mm 400MHz",
    "measurement_channel": "structure"
  },
  "release": "2021.7.5",
  "description": "Short description",
  "reference": "Reference to teh author",
  "license": "license",
  "tags": [
    "MRI",
    "3D",
    "Image"
  ],
  "task_categories": [
    "image-segmentation"
  ],
  "labels": {
    "0": "background",
    "1": "embryo",
    "2": "endosperm",
    "3": "aleuron"
  },
  "image_format": ".nii.gz",
  "annotation_format": ".nii.gz",
  "num_instances_total": 6,
  "num_instances_training": 2,
  "num_instances_test": 2,
  "num_instances_validation": 2,
  "num_occurrences_total": 6,
  "num_occurrences_training": 2,
  "num_occurrences_test": 2,
  "num_occurrences_validation": 2,
  "image_data_type": "int16",
  "annotation_data_type": "uint8",
  "value_channels": 1,
  "range": [
    0,
    16000
  ],
  "range_max": 1002,
  "range_min": 0,
  "range_avg": 167.57250248655913,
  "dimensions_max": [
    96,
    128,
    150
  ],
  "dimensions_min": [
    41,
    64,
    78
  ],
  "dimensions_avg": [
    57.49618320610687,
    92.01526717557252,
    115.6030534351145
  ],
  "resolution_unit": "mm",
  "resolution_voxel_size": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_max": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_min": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_avg": [
    0.1,
    0.1,
    0.1
  ],
  "training": [
    {
      "annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
      "image": "occurrence-0000/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0001/instance-0000/annotations.nii.gz",
      "image": "occurrence-0001/instance-0000/image.nii.gz"
    }
  ],
  "validation": [
    {
      "annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
      "image": "occurrence-0002/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0003/instance-0000/annotations.nii.gz",
      "image": "occurrence-0003/instance-0000/image.nii.gz"
    }
  ],
  "test": [
    {
      "annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
      "image": "occurrence-0004/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0005/instance-0000/annotations.nii.gz",
      "image": "occurrence-0005/instance-0000/image.nii.gz"
    }
  ],
  "data_references": {
    "occurrence-0000/instance-0000/image.nii.gz": "example1",
    "occurrence-0001/instance-0000/image.nii.gz": "example2",
    "occurrence-0002/instance-0000/image.nii.gz": "example3",
    "occurrence-0003/instance-0000/image.nii.gz": "folder_xyz/exampleA",
    "occurrence-0004/instance-0000/image.nii.gz": "folder_xyz/exampleB",
    "occurrence-0005/instance-0000/image.nii.gz": "folder_xyz/exampleC"
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magdi_data-0.3.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magdi_data-0.3.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file magdi_data-0.3.0.tar.gz.

File metadata

  • Download URL: magdi_data-0.3.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.3.0.tar.gz
Algorithm Hash digest
SHA256 59da9d18b8906b127b931b475647fef88c4e96f0a06966963434693f58a424c4
MD5 cbb5529fafc1b36d8474f5d732fbffda
BLAKE2b-256 86c4470fd7dde9966a85ddfac3c5f9676b8022a60d9adbca339707e16e9e8c08

See more details on using hashes here.

File details

Details for the file magdi_data-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: magdi_data-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 574100f57f8c67f2302a3e5be5d1d0a3aea3d06fb7220e64872bb7de79c2f87a
MD5 7f428724fc5dc0ae5353967a82dd75df
BLAKE2b-256 90c7799da75acda3151fe644ac636c1038ee9fb30b7dd0a9e0176075b1384ad7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page