Skip to main content

MAGDI data

Project description

MAGDI Data

This python package named magdi_data is responsible for data loading to be used for AI training. It also contains the definition of a specific dataset structure.

dataset_meta.json

The dataset_meta.json describes a dataset in the Occurrence Instance Format. Its properties have to follow a strict format to be readable by humans and machines.

Terms

Occurrence

  • An occurrence is a real scanned object. It can have one or more instances (replicates or measurements).

Instance

  • An instance is a digitalized object.

Contents

property description json type example
name Name of the dataset. string Maize
entity Type of real-world object the dataset pertains. string maize
release Version of release. string 2025.07.02
description Short description of the dataset. string NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded.
reference Description or URL where the data was published. string Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)
license Name which license is used for this dataset. string IPK proprietary
tags List of modalities to describe this dataset. On Hugging Face an overview of exsisting modalities can be found: https://huggingface.co/datasets array of string ["MRI","3D","Image"]
task_categories List of Machine Learning tasks for which the dataset is intended. Hugging Face gives an overview of possible tasks: https://huggingface.co/tasks array of string ["image-segmentation"]
labels OPTIONAL (Needed if labeled data exists). Class label mapping made of key value pairs. Keys must be ascending positive integers starting from zero and are formatted as strings.
They cannot have gaps. If a key has not a value, the value must be set as empty string e.g. "1": "".
object {
"0": "background",
"1": "embryo"
"2":"endosperm",
"3":"aleuron"}
image_format File ending of the image files. string .nii.gz
annotation_format OPTIONAL (Needed if labeled data exists). File ending of the annotation files. string .nii.gz
num_instances_total Total number of instances. integer 4
num_instances_training Number of instances of the training split. integer 2
num_instances_test OPTIONAL (Needed if test split exists). Number of instances of the test split. integer 2
num_instances_validation OPTIONAL (Needed if validation split exists). Number of instances of the validation split. integer 2
num_occurrences_total Total number of occurrences. integer 4
num_occurrences_training Number of occurrences of the training split. integer 2
num_occurrences_test OPTIONAL (Needed if test split exists). Number of occurrences of the test split. integer 2
num_occurrences_validation OPTIONAL (Needed if validation split exists). Number of occurrences of the validation split. integer 2
image_data_type Defines the type of data compatible to numpy (np.dtype) how the image information is stored. string int16
annotation_data_type OPTIONAL (Needed if labeled data exists). Defines the type of data compatible to numpy (np.dtype) how the annotation information is stored. string uint8
value_channels Number of channels how the image information is stored, e.g. 1 for MRI, 3 for RGB images. integer 1
dimensions_max Dimensions of the largest image. array of integer [96, 128, 150]
dimensions_min Dimensions of the smallest image. Must have the same shape as dimensionsMax. array of integer [41, 64, 78]
dimensions_avg Average of the dimensions over all images. Must have the same shape as dimensionsMax. array of number [57.49618320610687, 92.01526717557252, 115.6030534351145]
resolution_unit Unit for resolution_voxel_size. string mm
resolution_voxel_size Size of a voxel relative to the real world for each image dimension. array of number [0.1,0.1,0.1]
training Training split.
Contains one json object for each instance of the training split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set. "image" key is mandatory.
"annotations" key is only mandatory for annotated data.
array of object [
{
"annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
"image": "occurrence-0001/instance-0000/image.nii.gz"
}
]
validation OPTIONAL (Needed if validation split exists). Validation split.
Contains one json object for each instance of the validation split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
"image": "occurrence-0003/instance-0000/image.nii.gz"
}
]
test OPTIONAL (Needed if test split exists). Test split.
Contains one json object for each instance of the test split.
Each json object holds the path to the image file.
For annotated data the path to the annotation file is set.
"image" key is mandatory. "annotations" key is only needed for annotated data.
array of object [
{
"annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
"image": "occurrence-0005/instance-0000/image.nii.gz"
}
]
data_references Holds key-value pairs to map directory or files of this dataset to a reference.
It is not used for AI training. It is necessary to associate digital instances with its origins of creation.
E.g. Occurrence is mapped to the ID of a real world object.
object {
"occurrence-0000/instance-0000/annotations.nii.gz": "example1",
"occurrence-0001/instance-0000/annotations.nii.gz": "example3"
}

Example:

{
  "name": "Maize Example",
  "short_name": "ME",
  "entity": "maize",
  "meta_data": {
    "latin name": "Frumentum",
    "line name": "Mais ",
    "structure": "seed",
    "dap": "10 DAP",
    "device": "NMR Device",
    "coil": "CRP 13C/1H 5mm 400MHz",
    "measurement channel": "structure"
  },
  "release": "2026.01.08",
  "description": "NMR dataset with all maize samples. Test set dispersed among sets 7,8, 10, 11, tree set7 samples discarded.",
  "reference": "Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) ",
  "license": "IPK proprietary",
  "tags": [
    "MRI",
    "3D",
    "Image"
  ],
  "task_categories": [
    "image-segmentation"
  ],
  "labels": {
    "0": "background",
    "1": "embryo",
    "2": "endosperm",
    "3": "aleuron"
  },
  "image_format": ".nii.gz",
  "annotation_format": ".nii.gz",
  "num_instances_total": 6,
  "num_instances_training": 2,
  "num_instances_test": 2,
  "num_instances_validation": 2,
  "num_occurrences_total": 6,
  "num_occurrences_training": 2,
  "num_occurrences_test": 2,
  "num_occurrences_validation": 2,
  "image_data_type": "int16",
  "annotation_data_type": "uint8",
  "value_channels": 1,
  "dimensions_max": [
    96,
    128,
    150
  ],
  "dimensions_min": [
    41,
    64,
    78
  ],
  "dimensions_avg": [
    57.49618320610687,
    92.01526717557252,
    115.6030534351145
  ],
  "resolution_unit": "mm",
  "resolution_voxel_size_max": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_min": [
    0.1,
    0.1,
    0.1
  ],
  "resolution_voxel_size_avg": [
    0.1,
    0.1,
    0.1
  ],
  "training": [
    {
      "annotations": "occurrence-0000/instance-0000/annotations.nii.gz",
      "image": "occurrence-0000/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0001/instance-0000/annotations.nii.gz",
      "image": "occurrence-0001/instance-0000/image.nii.gz"
    }
  ],
  "validation": [
    {
      "annotations": "occurrence-0002/instance-0000/annotations.nii.gz",
      "image": "occurrence-0002/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0003/instance-0000/annotations.nii.gz",
      "image": "occurrence-0003/instance-0000/image.nii.gz"
    }
  ],
  "test": [
    {
      "annotations": "occurrence-0004/instance-0000/annotations.nii.gz",
      "image": "occurrence-0004/instance-0000/image.nii.gz"
    },
    {
      "annotations": "occurrence-0005/instance-0000/annotations.nii.gz",
      "image": "occurrence-0005/instance-0000/image.nii.gz"
    }
  ],
  "data_references": {
    "occurrence-0000/instance-0000/image.nii.gz": "example1",
    "occurrence-0001/instance-0000/image.nii.gz": "example2",
    "occurrence-0002/instance-0000/image.nii.gz": "example3",
    "occurrence-0003/instance-0000/image.nii.gz": "example4",
    "occurrence-0004/instance-0000/image.nii.gz": "example3",
    "occurrence-0005/instance-0000/image.nii.gz": "example4"
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magdi_data-0.1.0a145.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magdi_data-0.1.0a145-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file magdi_data-0.1.0a145.tar.gz.

File metadata

  • Download URL: magdi_data-0.1.0a145.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.1.0a145.tar.gz
Algorithm Hash digest
SHA256 983b3b427f6263e8c99d432595c73bd4e2f9e6ce16fba6c130a059ce25244016
MD5 ef7e2244d4180421cdfb8c76ad1e7801
BLAKE2b-256 452f41b0e40d51473443c322761d9f5be18aa04538286300a8fb121d8578d7d8

See more details on using hashes here.

File details

Details for the file magdi_data-0.1.0a145-py3-none-any.whl.

File metadata

  • Download URL: magdi_data-0.1.0a145-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for magdi_data-0.1.0a145-py3-none-any.whl
Algorithm Hash digest
SHA256 5eca1c777c2c985b59ffbeb5b6ebde7d712453c18414b456a2866741ccfcd17b
MD5 9cbe7f79d066d47e3e1e66ab4a2689c6
BLAKE2b-256 36c097c15fa8c35e0e5823a8ce0363a53d31865897749cb37cf75a771ed0023b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page