Skip to main content

MAGDI data

Project description

MAGDI Data

This python package named magdi_data is responsible for data loading to be used for AI training. It also contains the definition of a specific dataset structure.

dataset_meta.json

The dataset_meta.json describes a dataset in the Occurrence Instance Format. Its properties have to follow a strict format to be readable by humans and machines.

Terms

Occurrence

  • An occurrence is a real scanned object. It can have one or more instances (replicates or measurements).

Instance

  • An instance is a digitalized object.

Contents

property description json type example
id Identifier of the Dataset. Is to be used as folder name. Normalized string costing of "name", "release" and "uuid". Is automatically overwritten by each validation by pydantic model. string Maize
uuid Generated UUID string for this dataset.
name Name of the dataset.
short_name Abbreviation of the "name".
release Version of release.
descriptive_metadata Nested object that holds further optional fields to describe the dataset.
entity Type of real-world object the dataset pertains. string maize
string 2025.07.02
description Short description of the dataset. string NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded.
reference Description or URL where the data was published. string Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)
license Name which license is used for this dataset. string IPK proprietary
tags List of modalities to describe this dataset. On Hugging Face an overview of exsisting modalities can be found: https://huggingface.co/datasets array of string ["MRI","3D","Image"]
task_categories List of Machine Learning tasks for which the dataset is intended. Hugging Face gives an overview of possible tasks: https://huggingface.co/tasks array of string ["image-segmentation"]
labels OPTIONAL (Needed if labeled data exists). Class label mapping made of key value pairs. Keys must be ascending positive integers starting from zero and are formatted as strings.
They cannot have gaps. If a key has not a value, the value must be set as empty string e.g. "1": "".
object {
"0": "background",
"1": "embryo"
"2":"endosperm",
"3":"aleuron"}
image_format File ending of the image files. string .nii.gz
annotation_format OPTIONAL (Needed if labeled data exists). File ending of the annotation files. string .nii.gz
num_instances_total Total number of instances. integer 4
num_instances_training Number of instances of the training split. integer 2
num_instances_test OPTIONAL (Needed if test split exists). Number of instances of the test split. integer 2
num_instances_validation OPTIONAL (Needed if validation split exists). Number of instances of the validation split. integer 2
num_occurrences_total Total number of occurrences. integer 4
num_occurrences_training Number of occurrences of the training split. integer 2
num_occurrences_test OPTIONAL (Needed if test split exists). Number of occurrences of the test split. integer 2
num_occurrences_validation OPTIONAL (Needed if validation split exists). Number of occurrences of the validation split. integer 2
image_data_type Defines the type of data compatible to numpy (np.dtype) how the image information is stored. string int16
annotation_data_type OPTIONAL (Needed if labeled data exists). Defines the type of data compatible to numpy (np.dtype) how the annotation information is stored. string uint8
value_channels Number of channels how the image information is stored, e.g. 1 for MRI, 3 for RGB images. integer 1
range Possible value range of voxels across all instances for the entity. This field is set manually.
range_max Maximum value range across all instances. This field is calculated during validation by pydantic model.
range_min Minimum value range across all instances. This field is calculated during validation by pydantic model.
range_avg Average value range across all instances. This field is calculated during validation by pydantic model.
dimensions_max Dimensions of the largest image. array of integer [96, 128, 150]
dimensions_min Dimensions of the smallest image. Must have the same shape as dimensionsMax. array of integer [41, 64, 78]
dimensions_avg Average of the dimensions over all images. Must have the same shape as dimensionsMax. array of number [57.49618320610687, 92.01526717557252, 115.6030534351145]
resolution_unit Unit for resolution_voxel_size. Voxel size: relative to the real world for each image dimension string mm
resolution_voxel_size Size of a voxel for each instance. This field is set manually. array of number [0.1,0.1,0.1]
resolution_voxel_size_max Maximum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_min Minimum voxel size across all instances. This field is calculated during validation by pydantic model.
resolution_voxel_size_avg Average voxel size across all instances. This field is calculated during validation by pydantic model.
training Training split.
Contains one json object for each instance of the training split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set. "image" key is mandatory.
"annotations" key is only mandatory for annotated data.
array of object [
{
"annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
"image": "occurrence-0001/instance-0000/image.nii.gz"
}
]
test OPTIONAL (Needed if test split exists). Test split.
Contains one json object for each instance of the test split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
"image": "occurrence-0005/instance-0000/image.nii.gz"
}
]
validation OPTIONAL (Needed if validation split exists). Validation split.
Contains one json object for each instance of the validation split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
"image": "occurrence-0003/instance-0000/image.nii.gz"
}
]
data_references Holds key-value pairs to map directory or files of this dataset to a reference.
It is not used for AI training. It is necessary to associate digital instances with its origins of creation.
E.g. Occurrence is mapped to the ID of a real world object.
object {
"occurrence-0000/instance-0000/annotations.nii.gz": "example1",
"occurrence-0001/instance-0000/annotations.nii.gz": "example3"
}

Example:

{
  "id": "mais_karyopse_2025_07_02_806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "uuid": "806fd055-b90d-46d5-a1ce-dd7e61b286ee",
  "name": "Mais Karyopse",
  "short_name": "MK2025",
  "entity": "maize",
  "descriptive_metadata": {
    "latin_name": "Frumentum",
    "line_name": "Mais ",
    "structure": "seed",
    "dap": "10 DAP",
    "device": "NMR Device Name",
    "coil": "CRP 13C/1H 5mm 400MHz",
    "measurement_channel": "structure"
  },
  "release": "2021.7.5",
  "description": "Short description",
  "reference": "Reference to teh author",
  "license": "license",
  "tags": [
    "MRI",
    "3D",
    "Image"
  ],
  "task_categories": [
    "image-segmentation"
  ],
  "labels": {
    "0": "background",
    "1": "embryo",
    "2": "endosperm",
    "3": "aleuron"
  },
  "image_format": ".nii.gz",
  "annotation_format": ".nii.gz",
  "num_instances_total": 6,
  "num_instances_training": 2,
  "num_instances_test": 2,
  "num_instances_validation": 2,
  "num_occurrences_total": 6,
  "num_occurrences_training": 2,
  "num_occurrences_test": 2,
  "num_occurrences_validation": 2,
  "image_data_type": "int16",
  "annotation_data_type": "uint8",
  "value_channels": 1,
  "range": [
    0,
    16000
  ],
  "range_max": 1002,
  "range_min": 0,
  "range_avg": 167.57250248655913,
  "dimensions_max": [
    96,
    128,
    150
  ],
  "dimensions_min": [
    41,
    64,
    78
  ],
  "dimensions_avg": [
    57.49618320610687,
    92.01526717557252,
    115.6030534351145
  ],
  "resolution_unit": "mm",
  "resolution_voxel_size": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_max": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_min": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_avg": [
    0.1,
    0.1,
    0.1
  ],
  "training": [
    {
      "annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
      "image": "occurrence-0000/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0001/instance-0000/annotations.nii.gz",
      "image": "occurrence-0001/instance-0000/image.nii.gz"
    }
  ],
  "validation": [
    {
      "annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
      "image": "occurrence-0002/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0003/instance-0000/annotations.nii.gz",
      "image": "occurrence-0003/instance-0000/image.nii.gz"
    }
  ],
  "test": [
    {
      "annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
      "image": "occurrence-0004/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0005/instance-0000/annotations.nii.gz",
      "image": "occurrence-0005/instance-0000/image.nii.gz"
    }
  ],
  "data_references": {
    "occurrence-0000/instance-0000/image.nii.gz": "example1",
    "occurrence-0001/instance-0000/image.nii.gz": "example2",
    "occurrence-0002/instance-0000/image.nii.gz": "example3",
    "occurrence-0003/instance-0000/image.nii.gz": "folder_xyz/exampleA",
    "occurrence-0004/instance-0000/image.nii.gz": "folder_xyz/exampleB",
    "occurrence-0005/instance-0000/image.nii.gz": "folder_xyz/exampleC"
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magdi_data-0.2.0a158.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magdi_data-0.2.0a158-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file magdi_data-0.2.0a158.tar.gz.

File metadata

  • Download URL: magdi_data-0.2.0a158.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.2.0a158.tar.gz
Algorithm Hash digest
SHA256 9559df7397332b7fd019778bd03321e07f5945a20bf2509dc7b235b0e58910bf
MD5 ce23ef483f42ec9c4e03e1ce2cb4d68c
BLAKE2b-256 05b56409a6dcab070c73334e5ff655aa3ef6f54d7a2a4fd5720478ba9c0d89f7

See more details on using hashes here.

File details

Details for the file magdi_data-0.2.0a158-py3-none-any.whl.

File metadata

  • Download URL: magdi_data-0.2.0a158-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.2.0a158-py3-none-any.whl
Algorithm Hash digest
SHA256 c46b48e8b2da529e173a2c6cf5eb712e23622af454190c582d59396bf56434ec
MD5 29d70d8f1a1f3c8b20ae96b8cc4b7edb
BLAKE2b-256 8c88a14f49d1795084700aba9a11fd84b70760e27ba052c68368b2805d5a101d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page