CLI tool for creating hyperspectral image datasets for machine learning.
Project description
SpectralDatamaker
Python CLI tool designed to facilitate the creation of datasets with hyperspectral images for machine learning.
The dataset structure is organized as follows:
dataset_root/
├── images
│ ├── DATASET-01_image-name_0
│ ├── DATASET-01_image-name_1
│ ├── DATASET-01_image-name_2
│ └── DATASET-01_image-name_3
├── masks
│ ├── RoiMASK_image-name.csv
│ ├── PxMASK_image-name.npy
│ ├── DATASET-01_image-name_0
│ ├── DATASET-01_image-name_1
│ ├── DATASET-01_image-name_2
│ └── DATASET-01_image-name_3
├── source
│ ├── image-name.hdr
│ └── image-name.raw
└── metadata.json
This tool provides functionalities for processing the source images, generating region of interest (ROI) masks, pixel masks, labels, and cropping the images based on the generated masks.
CLI Usage
After installing the package, you can use the console command:
spectral-datamaker --help
You can also invoke the package module directly:
python -m spectral_datamaker --help
The CLI provides four main commands:
Create a complete dataset:
spectral-datamaker create <config.yaml> <output_directory>
Options:
--dry-run: Validate configuration without executing--skip-validation: Skip final dataset validation--no-interactive: Skip interactive mask adjustment (not yet implemented)
Validate an existing dataset:
spectral-datamaker validate <dataset_directory>
Options:
--config <file>: Validate against a specific configuration file
Inspect dataset metadata:
spectral-datamaker inspect <dataset_directory>
Options:
--format [json|yaml|table]: Output format (default: table)--show-images: List all processed images
Execute individual pipeline steps:
spectral-datamaker step <step_name> <config.yaml> <dataset_directory>
Available steps: structure, roi-mask, pixel-mask, crop, metadata
Library usage (Python API)
Besides the CLI, SpectralDatamaker can be used as a Python library. The most useful classes for inspection and validation are:
DatasetStructure: infers canonical dataset locations (images/,masks/,source/,metadata.json) from a root directory.Filenames: derives expected filenames and absolute paths for masks, labels, cropped outputs, and metadata.DatasetValidator: validates an existing dataset either from a config file or frommetadata.json.
from spectral_datamaker.config import DatasetStructure, Filenames
from spectral_datamaker.processors import DatasetValidator
dataset_root = "/path/to/dataset_root"
# 1) Infer dataset structure from root directory
structure = DatasetStructure(dataset_root)
print(structure.images_dir)
print(structure.masks_dir)
print(structure.source_dir)
print(structure.metadata_file)
# 2) Derive expected file paths and names
names = Filenames(structure)
print(names.get_roi_mask("image_1.hdr", abs=True))
print(names.get_px_mask("image_1.hdr", abs=True))
print(names.get_dataset_metadata(abs=True))
# 3) Validate dataset contents
validator = DatasetValidator(structure)
validator.validate_dataset_from_config("/path/to/dataset.yaml")
# Or, if metadata already exists:
# validator.validate_dataset_from_metadata()
Dataset config file
The dataset configuration file (e.g., dataset.yaml) contains the necessary information for creating a dataset from ENVI images. The YAML file should have the following structure:
dataset:
name: dataset-example
description: An example dataset created with SpectralDatamaker.
source-images:
- path: /path/to/source/image_1.hdr
masking:
shape: circle
size: 35
num: 6
- path: /path/to/source/image_2.hdr
masking:
shape: square
size: 20
num: 4
- path: /path/to/source/image_n.hdr
masking:
shape: triangle
size: 50
num: 2
segmentation:
enabled: true
classes:
- type_A
- type_B
classification:
enabled: false
Segmentation mode
When segmentation mode is enabled, SpectralDatamaker will generate a dataset with segmentation masks for each source image. The steps are as follows:
- Creates ROI masks based on the specified shape, size, and number of regions in the configuration file. A napari viewer is launched to allow the user to adjust the generated masks if necessary. Masks are saved when the user closes the viewer.
- Generates pixel masks from the ROI masks, asking the user to label each region of interest (ROI) with the corresponding class from the configuration file.
- Crops the source images based on the generated masks and saves the cropped images, masks in the appropriate directories.
Classification mode
[!NOTE] The classification mode is currently in development is not yet available for use. The following description is based on the intended functionality.
When classification mode is enabled, SpectralDatamaker will generate a dataset with class labels for each source image. The steps are as follows:
- Creates ROI masks based on the specified shape, size, and number of regions in the configuration file. A napari viewer is launched to allow the user to adjust the generated masks if necessary. Masks are saved when the user closes the viewer.
- Asks the user to label each ROI with the corresponding class from the configuration file. Saves the class labels in a CSV file.
- Crops the source images based on the generated masks and saves the cropped images in the appropriate directories.
Dataset metadata
SpectralDatamaker generates a metadata.json file containing information about the dataset, including the dataset name, description, source images, and the processing steps applied to each image. This metadata file is recognized by the SpectralDatamaker and can be used to validate the dataset structure and contents. An example of the metadata.json structure is as follows:
{
"name": "dataset-03",
"description": "Dataset created with one hyperespectral image.",
"last_update": "2026-04-08 13:52:01",
"source_images": ["/path/to/image_1.hdr"],
"types": ["segmentation"],
"segmentation_masking": {
"image_1": {
"label_map": {"0": "background", "1": "type_A", "2": "type_B"},
"num_classes": 3,
"classes": ["type_A", "type_B"],
"assignments": {
"type_A": [0,2,3],
"type_B": [1,5,4]
},
"source_image": "image_1.hdr",
"rois_file": "RoiMASK_image_1.csv",
"mask_file": "PxMASK_image_1.npy",
"created": "2026-04-08T13:51:33.931524",
"format": "npy"
}
}
}
Validations
SpectralDatamaker includes validation checks allowing users to verify the generated dataset structure and contents, as well as validate existing datasets. The validation includes checks for the presence of required directories and expected files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spectral_datamaker-0.2.1.tar.gz.
File metadata
- Download URL: spectral_datamaker-0.2.1.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edbed8b79d2ccbc59f403a0513650a47b4922399be15dc05dfab0ea0514317fa
|
|
| MD5 |
6f4fb5d52c66e064adaffcab962e785f
|
|
| BLAKE2b-256 |
58c7fa675748c6c195f27f9e11e5bd86ce0fb0d15985b0540a2f9f89d6ce7eb4
|
File details
Details for the file spectral_datamaker-0.2.1-py3-none-any.whl.
File metadata
- Download URL: spectral_datamaker-0.2.1-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8164260471b37d713c5a0af23d7fa8206f0d7952cfa0601ca1c8e92fa037666e
|
|
| MD5 |
aff5cd83f482af50cae48bfe156604bc
|
|
| BLAKE2b-256 |
b3f5bb148bca44f8974561b1148c3fb25381cf570142efeaeb13552da87da10b
|