Skip to main content

A comprehensive Python library for Near-Infrared Spectroscopy (NIRS) data analysis with ML/DL pipelines.

Project description

NIRS4ALL Logo CIRAD Logo

NIRS4ALL

A comprehensive Python library for Near-Infrared Spectroscopy data analysis

PyPI version Python 3.11+ License: CeCILL-2.1 Code style: ruff

DocumentationInstallationQuick StartExamplesContributing


Overview

NIRS4ALL bridges the gap between spectroscopic data and machine learning by providing a unified framework for data loading, preprocessing, model training, and evaluation. Built for researchers and practitioners working with Near-Infrared Spectroscopy data.

Performance Heatmap

Key Features

  • NIRS-Specific Preprocessing — SNV, MSC, Savitzky-Golay, Norris-Williams, wavelet denoise, OSC/EPO, and 30+ spectral transforms
  • Advanced PLS Models — AOM-PLS, POP-PLS, OPLS, DiPLS, MBPLS, and 15+ PLS variants with automatic operator selection
  • Multi-Backend ML — Seamless integration with scikit-learn, TensorFlow, PyTorch, and JAX
  • Declarative Pipelines — Define complex workflows with simple, readable syntax
  • Parallel Execution — Multi-core pipeline variant execution via joblib
  • Hyperparameter Tuning — Built-in Optuna integration for automated optimization
  • Rich Visualizations — Performance heatmaps, candlestick plots, SHAP explanations
  • Model Deployment — Export trained pipelines as portable .n4a bundles
  • sklearn CompatibleNIRSPipeline wrapper for SHAP, cross-validation, and more
Performance Heatmap Performance Distribution Regression Scatter Plot
Advanced visualization capabilities for model performance analysis

Installation

Basic Installation

pip install nirs4all

This installs the core library with scikit-learn support. Deep learning frameworks are optional.

With ML Backends

# TensorFlow
pip install nirs4all[tensorflow]

# PyTorch
pip install nirs4all[torch]

# JAX
pip install nirs4all[jax]

# All frameworks
pip install nirs4all[all]

# All frameworks with GPU support
pip install nirs4all[all-gpu]

Conda

Coming soon! We're working with conda-forge to make NIRS4ALL available through conda. In the meantime, use pip install nirs4all or docker.

# Available soon:
# conda install -c conda-forge nirs4all

Docker

docker pull ghcr.io/gbeurier/nirs4all:latest
docker run -v $(pwd):/workspace ghcr.io/gbeurier/nirs4all python my_script.py

Development Installation

git clone https://github.com/GBeurier/nirs4all.git
cd nirs4all
pip install -e ".[dev]"

Verify Installation

nirs4all --test-install      # Check dependencies
nirs4all --test-integration  # Run integration tests
nirs4all --version           # Check version

Quick Start

Simple API (Recommended)

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.cross_decomposition import PLSRegression

# Define your pipeline
pipeline = [
    MinMaxScaler(),
    {"y_processing": MinMaxScaler()},
    ShuffleSplit(n_splits=3, test_size=0.25),
    {"model": PLSRegression(n_components=10)}
]

# Train and evaluate
result = nirs4all.run(
    pipeline=pipeline,
    dataset="path/to/your/data",
    name="MyPipeline",
    verbose=1
)

# Access results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")

# Export for deployment
result.export("exports/best_model.n4a")

Session for Multiple Runs

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import RandomForestRegressor

with nirs4all.session(verbose=1, save_artifacts=True) as s:
    # Compare models with shared configuration
    pls_result = nirs4all.run(
        pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
        dataset="data/wheat.csv",
        name="PLS",
        session=s
    )

    rf_result = nirs4all.run(
        pipeline=[MinMaxScaler(), RandomForestRegressor(n_estimators=100)],
        dataset="data/wheat.csv",
        name="RandomForest",
        session=s
    )

    print(f"PLS: {pls_result.best_rmse:.4f} | RF: {rf_result.best_rmse:.4f}")

sklearn Integration with SHAP

import nirs4all
from nirs4all.sklearn import NIRSPipeline
import shap

# Train with nirs4all
result = nirs4all.run(pipeline, dataset)

# Wrap for sklearn compatibility
pipe = NIRSPipeline.from_result(result)

# Use with SHAP
explainer = shap.Explainer(pipe.predict, X_background)
shap_values = explainer(X_test)
shap.summary_plot(shap_values)

Pipeline Syntax

NIRS4ALL uses a declarative syntax for defining pipelines:

from nirs4all.operators.transforms import SNV, SavitzkyGolay, FirstDerivative

pipeline = [
    # Preprocessing
    MinMaxScaler(),
    SNV(),
    SavitzkyGolay(window_length=11, polyorder=2),

    # Target scaling
    {"y_processing": MinMaxScaler()},

    # Cross-validation
    ShuffleSplit(n_splits=5, test_size=0.2),

    # Models to compare
    {"model": PLSRegression(n_components=10)},
    {"model": RandomForestRegressor(n_estimators=100)},

    # Neural network with training parameters
    {
        "model": nicon,
        "name": "NICON-CNN",
        "train_params": {"epochs": 100, "patience": 20}
    }
]

Advanced Features

# Feature augmentation - generate preprocessing combinations
{
    "feature_augmentation": {
        "_or_": [SNV, FirstDerivative, SavitzkyGolay],
        "size": [1, (1, 2)],
        "count": 5
    }
}

# Hyperparameter optimization
{
    "model": PLSRegression(),
    "finetune_params": {
        "n_trials": 50,
        "model_params": {"n_components": ("int", 1, 30)}
    }
}

# Branching for parallel preprocessing paths
{
    "branch": [
        [SNV(), PLSRegression(n_components=10)],
        [MSC(), RandomForestRegressor()]
    ]
}

# Merge branch outputs (stacking)
{"merge": "predictions"}

Available Transforms

NIRS-Specific Preprocessing

Transform Description
SNV / StandardNormalVariate Standard Normal Variate normalization
RNV / RobustStandardNormalVariate Robust Normal Variate (outlier-resistant)
MSC / MultiplicativeScatterCorrection Multiplicative Scatter Correction
SavitzkyGolay Smoothing and derivative computation
FirstDerivative / SecondDerivative Spectral derivatives
NorrisWilliams Gap derivative with segment smoothing
WaveletDenoise Multi-level wavelet denoising with thresholding
OSC Orthogonal Signal Correction (DOSC)
EPO External Parameter Orthogonalization
Detrend Remove linear/polynomial trends
Gaussian Gaussian smoothing
Haar Haar wavelet decomposition

Signal Processing

Transform Description
Baseline Baseline correction (ALS, AirPLS, ArPLS, IModPoly, SNIP, etc.)
ReflectanceToAbsorbance Convert R to A using Beer-Lambert
ToAbsorbance / FromAbsorbance Signal type conversion
KubelkaMunk Kubelka-Munk transform
Resampler Wavelength interpolation
CARS / MCUVE Feature selection methods

Built-in NIRS Models

Model Description
AOMPLSRegressor / AOMPLSClassifier Adaptive Operator-Mixture PLS — auto-selects best preprocessing
POPPLSRegressor / POPPLSClassifier Per-Operator-Per-component PLS via PRESS
PLSDA PLS Discriminant Analysis
OPLS / OPLSDA Orthogonal PLS
MBPLS Multi-Block PLS
DiPLS Domain-Invariant PLS
IKPLS Improved Kernel PLS
FCKPLS Fractional Convolution Kernel PLS

Splitting Methods

Splitter Description
KennardStoneSplitter Kennard-Stone algorithm
SPXYSplitter Sample set Partitioning based on X and Y
SPXYFold / SPXYGFold SPXY-based K-Fold cross-validation (with group support)
KMeansSplitter K-means clustering based split
KBinsStratifiedSplitter Binned stratification for continuous targets

See Preprocessing Guide for complete reference.


Examples

The examples/ directory is organized by topic:

User Examples (examples/user/)

Category Examples
Getting Started Hello world, basic regression, classification, visualization
Data Handling Multi-source, data loading, metadata
Preprocessing SNV, MSC, derivatives, custom transforms
Models Multi-model, hyperparameter tuning, stacking, PLS variants
Cross-Validation KFold, group splits, nested CV
Deployment Export, prediction, workspace management
Explainability SHAP basics, sklearn integration, feature selection

Reference Examples (examples/reference/)

Complete syntax reference and advanced pipeline patterns.

Run examples:

cd examples
./run.sh              # Run all
./run.sh -i 1         # Run by index
./run.sh -n "U01*"    # Run by pattern

Documentation

Section Description
User Guide Preprocessing, API migration, augmentation
API Reference Module-level API, sklearn integration, data handling
Specifications Pipeline syntax, config format, metrics
Explanations SHAP, resampling, SNV theory

Full documentation: nirs4all.readthedocs.io


Research Applications

NIRS4ALL has been used in published research:

Houngbo, M. E., et al. (2024). Convolutional neural network allows amylose content prediction in yam (Dioscorea alata L.) flour using near infrared spectroscopy. Journal of the Science of Food and Agriculture, 104(8), 4915-4921. John Wiley & Sons, Ltd.


Citation

If you use NIRS4ALL in your research, please cite:

@software{beurier2025nirs4all,
  author = {Gregory Beurier and Denis Cornet and Lauriane Rouan},
  title = {NIRS4ALL: Open spectroscopy for everyone},
  url = {https://github.com/GBeurier/nirs4all},
  version = {0.7.1},
  year = {2026},
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


License

This project is licensed under the CeCILL-2.1 License — a French free software license compatible with GPL.


Acknowledgments

  • CIRAD for supporting this research
  • The open-source scientific Python community

Made for the spectroscopy community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nirs4all-0.8.3.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nirs4all-0.8.3-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file nirs4all-0.8.3.tar.gz.

File metadata

  • Download URL: nirs4all-0.8.3.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nirs4all-0.8.3.tar.gz
Algorithm Hash digest
SHA256 df0b9ba350477a40f21e8aeb44f74844385c92f9a134b36a96f19d1dafbcf8e2
MD5 de83768ae993a22000bf475a25ef9f1e
BLAKE2b-256 48dc74f27cdfb40cfb23d139feb24cfa691b7940b771373a23e5817fa8c9b21f

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.8.3.tar.gz:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nirs4all-0.8.3-py3-none-any.whl.

File metadata

  • Download URL: nirs4all-0.8.3-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nirs4all-0.8.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d0cf1498e5990a258ad06fc7d4afc651a9bec1c050388792718edcefd91d008
MD5 d4749f5659c0de48b4e7976999a81be6
BLAKE2b-256 53a89b7b2f0a767e4d4a09fc6a6bd360e84d72f0720427d2b133095f6ef3767c

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.8.3-py3-none-any.whl:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page