CLI tool for creating hyperspectral image datasets for machine learning.

These details have not been verified by PyPI

Project description

SpectralDatamaker

Python CLI tool designed to facilitate the creation of datasets with hyperspectral images for machine learning.

The dataset structure is organized as follows:

dataset_root/
├── images
│   ├── DATASET-01_image-name_0
│   ├── DATASET-01_image-name_1
│   ├── DATASET-01_image-name_2
│   └── DATASET-01_image-name_3
├── masks
│   ├── RoiMASK_image-name.csv
│   ├── PxMASK_image-name.npy
│   ├── DATASET-01_image-name_0
│   ├── DATASET-01_image-name_1
│   ├── DATASET-01_image-name_2
│   └── DATASET-01_image-name_3
├── source
│   ├── image-name.hdr
│   └── image-name.raw
└── metadata.json

This tool provides functionalities for processing the source images, generating region of interest (ROI) masks, pixel masks, labels, and cropping the images based on the generated masks.

CLI Usage

After installing the package, you can use the console command:

spectral-datamaker --help

You can also invoke the package module directly:

python -m spectral_datamaker --help

The CLI provides four main commands:

Create a complete dataset:

spectral-datamaker create <config.yaml> <output_directory>

Options:

--dry-run: Validate configuration without executing
--skip-validation: Skip final dataset validation
--no-interactive: Skip interactive mask adjustment (not yet implemented)

Validate an existing dataset:

spectral-datamaker validate <dataset_directory>

Options:

--config <file>: Validate against a specific configuration file

Inspect dataset metadata:

spectral-datamaker inspect <dataset_directory>

Options:

--format [json|yaml|table]: Output format (default: table)
--show-images: List all processed images

Execute individual pipeline steps:

spectral-datamaker step <step_name> <config.yaml> <dataset_directory>

Available steps: structure, roi-mask, pixel-mask, crop, metadata

Compose a new dataset from existing ones:

spectral-datamaker compose <compose.yaml> <output_directory>

Options:

--dry-run: Validate configuration without copying files

Library usage (Python API)

Besides the CLI, SpectralDatamaker can be used as a Python library. The most useful classes for inspection and validation are:

DatasetStructure: infers canonical dataset locations (images/, masks/, source/, metadata.json) from a root directory.
Filenames: derives expected filenames and absolute paths for masks, labels, cropped outputs, and metadata.
DatasetValidator: validates an existing dataset either from a config file or from metadata.json.
DatasetManager: provides methods for retrieving dataset information, listing processed images, and accessing metadata details.
ComposeConfig / ComposeConfigLoader: dataclass and loader for compose configuration files.
ComposeProcessor: builds a composed dataset programmatically from a ComposeConfig.

from spectral_datamaker.config import DatasetStructure, Filenames
from spectral_datamaker.dataset import DatasetValidator

dataset_root = "/path/to/dataset_root"

# 1) Infer dataset structure from root directory
structure = DatasetStructure(dataset_root)
print(structure.images_dir)
print(structure.masks_dir)
print(structure.source_dir)
print(structure.metadata_file)

# 2) Derive expected file paths and names
names = Filenames(structure)
print(names.get_roi_mask("image_1.hdr", abs=True))
print(names.get_px_mask("image_1.hdr", abs=True))
print(names.get_dataset_metadata(abs=True))

# 3) Validate dataset contents
validator = DatasetValidator(structure)
validator.validate_dataset_from_config("/path/to/dataset.yaml")
# Or, if metadata already exists:
# validator.validate_dataset_from_metadata()

Dataset config file

The dataset configuration file (e.g., dataset.yaml) contains the necessary information for creating a dataset from ENVI images. The YAML file should have the following structure:

Path resolution rules:

Absolute paths are used as-is.
Relative paths in source-images[].path are resolved relative to the directory that contains the YAML file.

dataset:
  name: dataset-example
  description: An example dataset created with SpectralDatamaker.

  source-images:
    - path: ../images/source/image_1.hdr
      masking:
        shape: circle
        size: 35
        num: 6

    - path: ../images/source/image_2.hdr
      masking:
        shape: square
        size: 20
        num: 4

    - path: ../images/source/image_n.hdr
      masking:
        shape: triangle
        size: 50
        num: 2

  segmentation:
    enabled: true
    classes:
      - type_A
      - type_B

  classification:
    enabled: false

Segmentation mode

When segmentation mode is enabled, SpectralDatamaker will generate a dataset with segmentation masks for each source image. The steps are as follows:

Creates ROI masks based on the specified shape, size, and number of regions in the configuration file. A napari viewer is launched to allow the user to adjust the generated masks if necessary. Masks are saved when the user closes the viewer.
Generates pixel masks from the ROI masks, asking the user to label each region of interest (ROI) with the corresponding class from the configuration file.
Crops the source images based on the generated masks and saves the cropped images, masks in the appropriate directories.

Classification mode

[!NOTE] The classification mode is currently in development is not yet available for use. The following description is based on the intended functionality.

When classification mode is enabled, SpectralDatamaker will generate a dataset with class labels for each source image. The steps are as follows:

Creates ROI masks based on the specified shape, size, and number of regions in the configuration file. A napari viewer is launched to allow the user to adjust the generated masks if necessary. Masks are saved when the user closes the viewer.
Asks the user to label each ROI with the corresponding class from the configuration file. Saves the class labels in a CSV file.
Crops the source images based on the generated masks and saves the cropped images in the appropriate directories.

Compose mode

The compose command builds a new dataset by selecting ROI crops from one or more already-processed datasets, without re-annotating anything. It reads the metadata of the source datasets to locate the crops, copies them to the new dataset, remaps the class labels according to the new class list, and generates the metadata.json of the composed dataset.

Compose config file

Path resolution rules:

Absolute paths are used as-is.
Relative paths in sources[].dataset are resolved relative to the directory that contains the compose YAML file.

compose:
  name: composed-dataset
  description: Dataset composed from multiple source datasets.
  classes:
    - type_A
    - type_B
    - type_C

  sources:
    - dataset: ../datasets/source_dataset_1
      class: type_A

    - dataset: ../datasets/source_dataset_1
      class: type_B
      num: 4        # optional — limit to 4 crops; omit to use all available

    - dataset: ../datasets/source_dataset_2
      class: type_C

classes: defines the label mapping of the output dataset (classes[0] → label 1, classes[1] → label 2, etc.).
sources: each entry selects all ROI crops of a given variety from a source dataset. The source dataset must have been created with spectral-datamaker create and must contain metadata.json.

Composed dataset structure

The output directory follows the same structure as a regular dataset:

output_dir/
├── images/
│   ├── COMPOSED_imageA_type_A_0.npy
│   ├── COMPOSED_imageA_type_A_1.npy
│   └── COMPOSED_imageB_type_C_0.npy
├── masks/
│   ├── COMPOSED_imageA_type_A_0.npy
│   ├── COMPOSED_imageA_type_A_1.npy
│   └── COMPOSED_imageB_type_C_0.npy
├── source/
└── metadata.json

Crops from different source images and varieties are grouped into virtual source image keys of the form <source_image>_<variety>. Within each group, crops are indexed sequentially from 0.

Dataset metadata

SpectralDatamaker generates a metadata.json file containing information about the dataset, including the dataset name, description, source images, and the processing steps applied to each image. This metadata file is recognized by the SpectralDatamaker and can be used to validate the dataset structure and contents. An example of the metadata.json structure is as follows:

{
    "name": "dataset-03",
    "description": "Dataset created with one hyperespectral image.",
    "last_update": "2026-04-08 13:52:01",
    "source_images": ["/path/to/image_1.hdr"],
    "types": ["segmentation"],
    "segmentation_masking": {
        "image_1": {
            "label_map": {"0": "background", "1": "type_A", "2": "type_B"},
            "num_classes": 3,
            "classes": ["type_A", "type_B"],
            "assignments": {
                "type_A": [0,2,3],
                "type_B": [1,5,4]
            },
            "source_image": "image_1.hdr",
            "source_dataset": "",
            "rois_file": "RoiMASK_image_1.csv",
            "mask_file": "PxMASK_image_1.npy",
            "created": "2026-04-08T13:51:33.931524",
            "format": "npy"
        }
    }
}

For composed datasets, source_images contains virtual group keys (one per source image × variety combination) and each segmentation_masking entry includes a source_dataset field pointing to the origin dataset:

{
    "name": "composed-dataset",
    "description": "Dataset composed from multiple source datasets.",
    "last_update": "2026-05-18 10:00:00",
    "source_images": ["image_1_type_A", "image_2_type_C"],
    "types": ["segmentation"],
    "segmentation_masking": {
        "image_1_type_A": {
            "label_map": {"0": "background", "1": "type_A", "2": "type_B", "3": "type_C"},
            "num_classes": 4,
            "classes": ["type_A", "type_B", "type_C"],
            "assignments": {
                "type_A": [0, 1, 2],
                "type_B": [],
                "type_C": []
            },
            "source_image": "image_1_type_A",
            "source_dataset": "/path/to/source_dataset_1",
            "rois_file": "",
            "mask_file": "",
            "created": "2026-05-18T10:00:00.000000",
            "format": "npy"
        }
    }
}

Validations

SpectralDatamaker includes validation checks allowing users to verify the generated dataset structure and contents, as well as validate existing datasets. The validation includes checks for the presence of required directories and expected files.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.2

Jun 18, 2026

0.6.1

Jun 16, 2026

0.6.0

Jun 12, 2026

This version

0.5.2

Jun 1, 2026

0.5.1

May 18, 2026

0.5.0

May 18, 2026

0.4.0

Apr 20, 2026

0.3.0

Apr 20, 2026

0.2.1

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectral_datamaker-0.5.2.tar.gz (21.9 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spectral_datamaker-0.5.2-py3-none-any.whl (27.0 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file spectral_datamaker-0.5.2.tar.gz.

File metadata

Download URL: spectral_datamaker-0.5.2.tar.gz
Upload date: Jun 1, 2026
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for spectral_datamaker-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`83d23c5657cdbe17c45b965e4c3e23477160fb495e86df4b55902e936e8d7de9`
MD5	`5ad0c3411c41692d772d0eabb44c358d`
BLAKE2b-256	`58b9aa5fc1bde5b412cd56dfe93fd78acd6ea0fa34e62d22298287f2de2190c4`

See more details on using hashes here.

File details

Details for the file spectral_datamaker-0.5.2-py3-none-any.whl.

File metadata

Download URL: spectral_datamaker-0.5.2-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 27.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for spectral_datamaker-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e71393021d254a728b5fe2e6ff802b376f738bc3af7d8b1106c9e5c4b65195f`
MD5	`272916c9b6264042de5f4ed4b398b3ed`
BLAKE2b-256	`b990054a00ffe07f1873357af690252821ed265caef371eedd98d6f06f030372`

See more details on using hashes here.

spectral-datamaker 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

SpectralDatamaker

CLI Usage

Library usage (Python API)

Dataset config file

Segmentation mode

Classification mode

Compose mode

Compose config file

Composed dataset structure

Dataset metadata

Validations

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes