Skip to main content

A comprehensive Python library for Near-Infrared Spectroscopy (NIRS) data analysis with ML/DL pipelines.

Project description

NIRS4ALL Logo CIRAD Logo

NIRS4ALL

A comprehensive Python library for Near-Infrared Spectroscopy data analysis

PyPI version Python 3.11+ License: CeCILL-2.1 Code style: ruff

DocumentationInstallationQuick StartExamplesContributing


Overview

NIRS4ALL bridges the gap between spectroscopic data and machine learning by providing a unified framework for data loading, preprocessing, model training, and evaluation. Built for researchers and practitioners working with Near-Infrared Spectroscopy data.

Performance Heatmap

Key Features

  • NIRS-Specific Preprocessing — SNV, MSC, Savitzky-Golay, Norris-Williams, wavelet denoise, OSC/EPO, and 30+ spectral transforms
  • Advanced PLS Models — AOM-PLS, POP-PLS, OPLS, DiPLS, MBPLS, and 15+ PLS variants with automatic operator selection
  • Multi-Backend ML — Seamless integration with scikit-learn, TensorFlow, PyTorch, and JAX
  • Declarative Pipelines — Define complex workflows with simple, readable syntax
  • Parallel Execution — Multi-core pipeline variant execution via joblib
  • Hyperparameter Tuning — Built-in Optuna integration for automated optimization
  • Rich Visualizations — Performance heatmaps, candlestick plots, SHAP explanations
  • Model Deployment — Export trained pipelines as portable .n4a bundles
  • sklearn CompatibleNIRSPipeline wrapper for SHAP, cross-validation, and more
Performance Heatmap Performance Distribution Regression Scatter Plot
Advanced visualization capabilities for model performance analysis

Installation

Basic Installation

pip install nirs4all

This installs the core library with scikit-learn support. Deep learning frameworks are optional.

With ML Backends

# TensorFlow
pip install nirs4all[tensorflow]

# PyTorch
pip install nirs4all[torch]

# JAX
pip install nirs4all[jax]

# All frameworks
pip install nirs4all[all]

# All frameworks with GPU support
pip install nirs4all[all-gpu]

Conda

Coming soon! We're working with conda-forge to make NIRS4ALL available through conda. In the meantime, use pip install nirs4all or docker.

# Available soon:
# conda install -c conda-forge nirs4all

Docker

docker pull ghcr.io/gbeurier/nirs4all:latest
docker run -v $(pwd):/workspace ghcr.io/gbeurier/nirs4all python my_script.py

Development Installation

git clone https://github.com/GBeurier/nirs4all.git
cd nirs4all
pip install -e ".[dev]"

Verify Installation

nirs4all --test-install      # Check dependencies
nirs4all --test-integration  # Run integration tests
nirs4all --version           # Check version

Quick Start

Simple API (Recommended)

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.cross_decomposition import PLSRegression

# Define your pipeline
pipeline = [
    MinMaxScaler(),
    {"y_processing": MinMaxScaler()},
    ShuffleSplit(n_splits=3, test_size=0.25),
    {"model": PLSRegression(n_components=10)}
]

# Train and evaluate
result = nirs4all.run(
    pipeline=pipeline,
    dataset="path/to/your/data",
    name="MyPipeline",
    verbose=1
)

# Access results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")

# Export for deployment
result.export("exports/best_model.n4a")

Session for Multiple Runs

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import RandomForestRegressor

with nirs4all.session(verbose=1, save_artifacts=True) as s:
    # Compare models with shared configuration
    pls_result = nirs4all.run(
        pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
        dataset="data/wheat.csv",
        name="PLS",
        session=s
    )

    rf_result = nirs4all.run(
        pipeline=[MinMaxScaler(), RandomForestRegressor(n_estimators=100)],
        dataset="data/wheat.csv",
        name="RandomForest",
        session=s
    )

    print(f"PLS: {pls_result.best_rmse:.4f} | RF: {rf_result.best_rmse:.4f}")

sklearn Integration with SHAP

import nirs4all
from nirs4all.sklearn import NIRSPipeline
import shap

# Train with nirs4all
result = nirs4all.run(pipeline, dataset)

# Wrap for sklearn compatibility
pipe = NIRSPipeline.from_result(result)

# Use with SHAP
explainer = shap.Explainer(pipe.predict, X_background)
shap_values = explainer(X_test)
shap.summary_plot(shap_values)

Pipeline Syntax

NIRS4ALL uses a declarative syntax for defining pipelines:

from nirs4all.operators.transforms import SNV, SavitzkyGolay, FirstDerivative

pipeline = [
    # Preprocessing
    MinMaxScaler(),
    SNV(),
    SavitzkyGolay(window_length=11, polyorder=2),

    # Target scaling
    {"y_processing": MinMaxScaler()},

    # Cross-validation
    ShuffleSplit(n_splits=5, test_size=0.2),

    # Models to compare
    {"model": PLSRegression(n_components=10)},
    {"model": RandomForestRegressor(n_estimators=100)},

    # Neural network with training parameters
    {
        "model": nicon,
        "name": "NICON-CNN",
        "train_params": {"epochs": 100, "patience": 20}
    }
]

Advanced Features

# Feature augmentation - generate preprocessing combinations
{
    "feature_augmentation": {
        "_or_": [SNV, FirstDerivative, SavitzkyGolay],
        "size": [1, (1, 2)],
        "count": 5
    }
}

# Hyperparameter optimization
{
    "model": PLSRegression(),
    "finetune_params": {
        "n_trials": 50,
        "model_params": {"n_components": ("int", 1, 30)}
    }
}

# Branching for parallel preprocessing paths
{
    "branch": [
        [SNV(), PLSRegression(n_components=10)],
        [MSC(), RandomForestRegressor()]
    ]
}

# Merge branch outputs (stacking)
{"merge": "predictions"}

Available Transforms

NIRS-Specific Preprocessing

Transform Description
SNV / StandardNormalVariate Standard Normal Variate normalization
RNV / RobustStandardNormalVariate Robust Normal Variate (outlier-resistant)
MSC / MultiplicativeScatterCorrection Multiplicative Scatter Correction
SavitzkyGolay Smoothing and derivative computation
FirstDerivative / SecondDerivative Spectral derivatives
NorrisWilliams Gap derivative with segment smoothing
WaveletDenoise Multi-level wavelet denoising with thresholding
OSC Orthogonal Signal Correction (DOSC)
EPO External Parameter Orthogonalization
Detrend Remove linear/polynomial trends
Gaussian Gaussian smoothing
Haar Haar wavelet decomposition

Signal Processing

Transform Description
Baseline Baseline correction (ALS, AirPLS, ArPLS, IModPoly, SNIP, etc.)
ReflectanceToAbsorbance Convert R to A using Beer-Lambert
ToAbsorbance / FromAbsorbance Signal type conversion
KubelkaMunk Kubelka-Munk transform
Resampler Wavelength interpolation
CARS / MCUVE Feature selection methods

Built-in NIRS Models

Model Description
AOMPLSRegressor / AOMPLSClassifier Adaptive Operator-Mixture PLS — auto-selects best preprocessing
POPPLSRegressor / POPPLSClassifier Per-Operator-Per-component PLS via PRESS
PLSDA PLS Discriminant Analysis
OPLS / OPLSDA Orthogonal PLS
MBPLS Multi-Block PLS
DiPLS Domain-Invariant PLS
IKPLS Improved Kernel PLS
FCKPLS Fractional Convolution Kernel PLS

Splitting Methods

Splitter Description
KennardStoneSplitter Kennard-Stone algorithm
SPXYSplitter Sample set Partitioning based on X and Y
SPXYFold / SPXYGFold SPXY-based K-Fold cross-validation (with group support)
KMeansSplitter K-means clustering based split
KBinsStratifiedSplitter Binned stratification for continuous targets

See Preprocessing Guide for complete reference.


Examples

The examples/ directory is organized by topic:

User Examples (examples/user/)

Category Examples
Getting Started Hello world, basic regression, classification, visualization
Data Handling Multi-source, data loading, metadata
Preprocessing SNV, MSC, derivatives, custom transforms
Models Multi-model, hyperparameter tuning, stacking, PLS variants
Cross-Validation KFold, group splits, nested CV
Deployment Export, prediction, workspace management
Explainability SHAP basics, sklearn integration, feature selection

Reference Examples (examples/reference/)

Complete syntax reference and advanced pipeline patterns.

Run examples:

cd examples
./run.sh              # Run all
./run.sh -i 1         # Run by index
./run.sh -n "U01*"    # Run by pattern

Documentation

Section Description
User Guide Preprocessing, API migration, augmentation
API Reference Module-level API, sklearn integration, data handling
Specifications Pipeline syntax, config format, metrics
Explanations SHAP, resampling, SNV theory

Full documentation: nirs4all.readthedocs.io


Research Applications

NIRS4ALL has been used in published research:

Houngbo, M. E., et al. (2024). Convolutional neural network allows amylose content prediction in yam (Dioscorea alata L.) flour using near infrared spectroscopy. Journal of the Science of Food and Agriculture, 104(8), 4915-4921. John Wiley & Sons, Ltd.


Citation

If you use NIRS4ALL in your research, please cite:

@software{beurier2025nirs4all,
  author = {Gregory Beurier and Denis Cornet and Lauriane Rouan},
  title = {NIRS4ALL: Open spectroscopy for everyone},
  url = {https://github.com/GBeurier/nirs4all},
  version = {0.7.1},
  year = {2026},
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


License

This project is licensed under the CeCILL-2.1 License — a French free software license compatible with GPL.


Acknowledgments

  • CIRAD for supporting this research
  • The open-source scientific Python community

Made for the spectroscopy community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nirs4all-0.8.5.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nirs4all-0.8.5-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file nirs4all-0.8.5.tar.gz.

File metadata

  • Download URL: nirs4all-0.8.5.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nirs4all-0.8.5.tar.gz
Algorithm Hash digest
SHA256 c4b058c50c23e3e363a6f255816bbcceac598750e5573cca3766021ca72e923e
MD5 4e7d753722e430105410d5e3a2d86640
BLAKE2b-256 a06fd1c49324eb31435cdd967919cea4bab3ce8d18984a22506c1ab39d3bc4bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.8.5.tar.gz:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nirs4all-0.8.5-py3-none-any.whl.

File metadata

  • Download URL: nirs4all-0.8.5-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nirs4all-0.8.5-py3-none-any.whl
Algorithm Hash digest
SHA256 755db18d6a88fc066847a018ecd9dbc3868aa315e844655c00a8d84d563fab51
MD5 f57bbe2f485319426df4412f1619b686
BLAKE2b-256 aa02bf653df584cdf1da68e4b96f2f2f46500162c251e8c406d817884d49a3a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.8.5-py3-none-any.whl:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page