Skip to main content

A comprehensive Python library for Near-Infrared Spectroscopy (NIRS) data analysis with ML/DL pipelines.

Project description

NIRS4ALL Logo CIRAD Logo

NIRS4ALL

A comprehensive Python library for Near-Infrared Spectroscopy data analysis

PyPI version Python 3.11+ License: CeCILL-2.1 Code style: ruff

DocumentationInstallationQuick StartExamplesContributing


Overview

NIRS4ALL bridges the gap between spectroscopic data and machine learning by providing a unified framework for data loading, preprocessing, model training, and evaluation. Built for researchers and practitioners working with Near-Infrared Spectroscopy data.

Performance Heatmap

Key Features

  • NIRS-Specific Preprocessing — SNV, MSC, Savitzky-Golay, Norris-Williams, wavelet denoise, OSC/EPO, and 30+ spectral transforms
  • Advanced PLS Models — AOM-PLS, POP-PLS, OPLS, DiPLS, MBPLS, and 15+ PLS variants with automatic operator selection
  • Multi-Backend ML — Seamless integration with scikit-learn, TensorFlow, PyTorch, and JAX
  • Declarative Pipelines — Define complex workflows with simple, readable syntax
  • Parallel Execution — Multi-core pipeline variant execution via joblib
  • Hyperparameter Tuning — Built-in Optuna integration for automated optimization
  • Rich Visualizations — Performance heatmaps, candlestick plots, SHAP explanations
  • Model Deployment — Export trained pipelines as portable .n4a bundles
  • sklearn CompatibleNIRSPipeline wrapper for SHAP, cross-validation, and more
Performance Heatmap Performance Distribution Regression Scatter Plot
Advanced visualization capabilities for model performance analysis

Installation

Basic Installation

pip install nirs4all

This installs the core library with scikit-learn support. Deep learning frameworks are optional.

With ML Backends

# TensorFlow
pip install nirs4all[tensorflow]

# PyTorch
pip install nirs4all[torch]

# JAX
pip install nirs4all[jax]

# All frameworks
pip install nirs4all[all]

# All frameworks with GPU support
pip install nirs4all[all-gpu]

Conda

Coming soon! We're working with conda-forge to make NIRS4ALL available through conda. In the meantime, use pip install nirs4all or docker.

# Available soon:
# conda install -c conda-forge nirs4all

Docker

docker pull ghcr.io/gbeurier/nirs4all:latest
docker run -v $(pwd):/workspace ghcr.io/gbeurier/nirs4all python my_script.py

Development Installation

git clone https://github.com/GBeurier/nirs4all.git
cd nirs4all
pip install -e ".[dev]"

Verify Installation

nirs4all --test-install      # Check dependencies
nirs4all --test-integration  # Run integration tests
nirs4all --version           # Check version

Quick Start

Simple API (Recommended)

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.cross_decomposition import PLSRegression

# Define your pipeline
pipeline = [
    MinMaxScaler(),
    {"y_processing": MinMaxScaler()},
    ShuffleSplit(n_splits=3, test_size=0.25),
    {"model": PLSRegression(n_components=10)}
]

# Train and evaluate
result = nirs4all.run(
    pipeline=pipeline,
    dataset="path/to/your/data",
    name="MyPipeline",
    verbose=1
)

# Access results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")

# Export for deployment
result.export("exports/best_model.n4a")

Session for Multiple Runs

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import RandomForestRegressor

with nirs4all.session(verbose=1, save_artifacts=True) as s:
    # Compare models with shared configuration
    pls_result = nirs4all.run(
        pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
        dataset="data/wheat.csv",
        name="PLS",
        session=s
    )

    rf_result = nirs4all.run(
        pipeline=[MinMaxScaler(), RandomForestRegressor(n_estimators=100)],
        dataset="data/wheat.csv",
        name="RandomForest",
        session=s
    )

    print(f"PLS: {pls_result.best_rmse:.4f} | RF: {rf_result.best_rmse:.4f}")

sklearn Integration with SHAP

import nirs4all
from nirs4all.sklearn import NIRSPipeline
import shap

# Train with nirs4all
result = nirs4all.run(pipeline, dataset)

# Wrap for sklearn compatibility
pipe = NIRSPipeline.from_result(result)

# Use with SHAP
explainer = shap.Explainer(pipe.predict, X_background)
shap_values = explainer(X_test)
shap.summary_plot(shap_values)

Pipeline Syntax

NIRS4ALL uses a declarative syntax for defining pipelines:

from nirs4all.operators.transforms import SNV, SavitzkyGolay, FirstDerivative

pipeline = [
    # Preprocessing
    MinMaxScaler(),
    SNV(),
    SavitzkyGolay(window_length=11, polyorder=2),

    # Target scaling
    {"y_processing": MinMaxScaler()},

    # Cross-validation
    ShuffleSplit(n_splits=5, test_size=0.2),

    # Models to compare
    {"model": PLSRegression(n_components=10)},
    {"model": RandomForestRegressor(n_estimators=100)},

    # Neural network with training parameters
    {
        "model": nicon,
        "name": "NICON-CNN",
        "train_params": {"epochs": 100, "patience": 20}
    }
]

Advanced Features

# Feature augmentation - generate preprocessing combinations
{
    "feature_augmentation": {
        "_or_": [SNV, FirstDerivative, SavitzkyGolay],
        "size": [1, (1, 2)],
        "count": 5
    }
}

# Hyperparameter optimization
{
    "model": PLSRegression(),
    "finetune_params": {
        "n_trials": 50,
        "model_params": {"n_components": ("int", 1, 30)}
    }
}

# Branching for parallel preprocessing paths
{
    "branch": [
        [SNV(), PLSRegression(n_components=10)],
        [MSC(), RandomForestRegressor()]
    ]
}

# Merge branch outputs (stacking)
{"merge": "predictions"}

Available Transforms

NIRS-Specific Preprocessing

Transform Description
SNV / StandardNormalVariate Standard Normal Variate normalization
RNV / RobustStandardNormalVariate Robust Normal Variate (outlier-resistant)
MSC / MultiplicativeScatterCorrection Multiplicative Scatter Correction
SavitzkyGolay Smoothing and derivative computation
FirstDerivative / SecondDerivative Spectral derivatives
NorrisWilliams Gap derivative with segment smoothing
WaveletDenoise Multi-level wavelet denoising with thresholding
OSC Orthogonal Signal Correction (DOSC)
EPO External Parameter Orthogonalization
Detrend Remove linear/polynomial trends
Gaussian Gaussian smoothing
Haar Haar wavelet decomposition

Signal Processing

Transform Description
Baseline Baseline correction (ALS, AirPLS, ArPLS, IModPoly, SNIP, etc.)
ReflectanceToAbsorbance Convert R to A using Beer-Lambert
ToAbsorbance / FromAbsorbance Signal type conversion
KubelkaMunk Kubelka-Munk transform
Resampler Wavelength interpolation
CARS / MCUVE Feature selection methods

Built-in NIRS Models

Model Description
AOMPLSRegressor / AOMPLSClassifier Adaptive Operator-Mixture PLS — auto-selects best preprocessing
POPPLSRegressor / POPPLSClassifier Per-Operator-Per-component PLS via PRESS
PLSDA PLS Discriminant Analysis
OPLS / OPLSDA Orthogonal PLS
MBPLS Multi-Block PLS
DiPLS Domain-Invariant PLS
IKPLS Improved Kernel PLS
FCKPLS Fractional Convolution Kernel PLS

Splitting Methods

Splitter Description
KennardStoneSplitter Kennard-Stone algorithm
SPXYSplitter Sample set Partitioning based on X and Y
SPXYFold / SPXYGFold SPXY-based K-Fold cross-validation (with group support)
KMeansSplitter K-means clustering based split
KBinsStratifiedSplitter Binned stratification for continuous targets

See Preprocessing Guide for complete reference.


Examples

The examples/ directory is organized by topic:

User Examples (examples/user/)

Category Examples
Getting Started Hello world, basic regression, classification, visualization
Data Handling Multi-source, data loading, metadata
Preprocessing SNV, MSC, derivatives, custom transforms
Models Multi-model, hyperparameter tuning, stacking, PLS variants
Cross-Validation KFold, group splits, nested CV
Deployment Export, prediction, workspace management
Explainability SHAP basics, sklearn integration, feature selection

Reference Examples (examples/reference/)

Complete syntax reference and advanced pipeline patterns.

Run examples:

cd examples
./run.sh              # Run all
./run.sh -i 1         # Run by index
./run.sh -n "U01*"    # Run by pattern

Documentation

Section Description
User Guide Preprocessing, API migration, augmentation
API Reference Module-level API, sklearn integration, data handling
Specifications Pipeline syntax, config format, metrics
Explanations SHAP, resampling, SNV theory

Full documentation: nirs4all.readthedocs.io


Research Applications

NIRS4ALL has been used in published research:

Houngbo, M. E., et al. (2024). Convolutional neural network allows amylose content prediction in yam (Dioscorea alata L.) flour using near infrared spectroscopy. Journal of the Science of Food and Agriculture, 104(8), 4915-4921. John Wiley & Sons, Ltd.


Citation

If you use NIRS4ALL in your research, please cite:

@software{beurier2025nirs4all,
  author = {Gregory Beurier and Denis Cornet and Lauriane Rouan},
  title = {NIRS4ALL: Open spectroscopy for everyone},
  url = {https://github.com/GBeurier/nirs4all},
  version = {0.7.1},
  year = {2026},
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


License

This project is licensed under the CeCILL-2.1 License — a French free software license compatible with GPL.


Acknowledgments

  • CIRAD for supporting this research
  • The open-source scientific Python community

Made for the spectroscopy community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nirs4all-0.8.11.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nirs4all-0.8.11-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file nirs4all-0.8.11.tar.gz.

File metadata

  • Download URL: nirs4all-0.8.11.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nirs4all-0.8.11.tar.gz
Algorithm Hash digest
SHA256 89ee00e47ac285e8259cb23d235bd42c453611b70d2ac160b8745fcf9f879a60
MD5 a40feb5b2360cec95408742a954c3285
BLAKE2b-256 17f655d115c219d739573dcb6fcfc119c3518f6745cf21efb8fa693bcc63ec45

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.8.11.tar.gz:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nirs4all-0.8.11-py3-none-any.whl.

File metadata

  • Download URL: nirs4all-0.8.11-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nirs4all-0.8.11-py3-none-any.whl
Algorithm Hash digest
SHA256 07e67248e37359b8066749426a3fb5b8381759cb73ceeb74c24ab144dc682cd7
MD5 8a1293d884bdb63c4af7a4ad693739b7
BLAKE2b-256 58e312dc33b7329f242c60972df729ff372087a43672b82e636f421e0c4679ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.8.11-py3-none-any.whl:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page