A comprehensive Python library for Near-Infrared Spectroscopy (NIRS) data analysis with ML/DL pipelines.
Project description
NIRS4ALL
A comprehensive Python library for Near-Infrared Spectroscopy data analysis
Documentation • Installation • Quick Start • Examples • Contributing
Overview
NIRS4ALL bridges the gap between spectroscopic data and machine learning by providing a unified framework for data loading, preprocessing, model training, and evaluation. Built for researchers and practitioners working with Near-Infrared Spectroscopy data.
Key Features
- NIRS-Specific Preprocessing — SNV, MSC, Savitzky-Golay, Norris-Williams, wavelet denoise, OSC/EPO, and 30+ spectral transforms
- Advanced PLS Models — AOM-PLS, POP-PLS, OPLS, DiPLS, MBPLS, and 15+ PLS variants with automatic operator selection
- Multi-Backend ML — Seamless integration with scikit-learn, TensorFlow, PyTorch, and JAX
- Declarative Pipelines — Define complex workflows with simple, readable syntax
- Parallel Execution — Multi-core pipeline variant execution via joblib
- Hyperparameter Tuning — Built-in Optuna integration for automated optimization
- Rich Visualizations — Performance heatmaps, candlestick plots, SHAP explanations
- Model Deployment — Export trained pipelines as portable
.n4abundles - sklearn Compatible —
NIRSPipelinewrapper for SHAP, cross-validation, and more
Advanced visualization capabilities for model performance analysis
Installation
Basic Installation
pip install nirs4all
This installs the core library with scikit-learn support. Deep learning frameworks are optional.
With ML Backends
# TensorFlow
pip install nirs4all[tensorflow]
# PyTorch
pip install nirs4all[torch]
# JAX
pip install nirs4all[jax]
# All frameworks
pip install nirs4all[all]
# All frameworks with GPU support
pip install nirs4all[all-gpu]
Conda
Coming soon! We're working with conda-forge to make NIRS4ALL available through conda. In the meantime, use pip install nirs4all or docker.
# Available soon:
# conda install -c conda-forge nirs4all
Docker
docker pull ghcr.io/gbeurier/nirs4all:latest
docker run -v $(pwd):/workspace ghcr.io/gbeurier/nirs4all python my_script.py
Development Installation
git clone https://github.com/GBeurier/nirs4all.git
cd nirs4all
pip install -e ".[dev]"
Verify Installation
nirs4all --test-install # Check dependencies
nirs4all --test-integration # Run integration tests
nirs4all --version # Check version
Quick Start
Simple API (Recommended)
import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.cross_decomposition import PLSRegression
# Define your pipeline
pipeline = [
MinMaxScaler(),
{"y_processing": MinMaxScaler()},
ShuffleSplit(n_splits=3, test_size=0.25),
{"model": PLSRegression(n_components=10)}
]
# Train and evaluate
result = nirs4all.run(
pipeline=pipeline,
dataset="path/to/your/data",
name="MyPipeline",
verbose=1
)
# Access results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")
# Export for deployment
result.export("exports/best_model.n4a")
Session for Multiple Runs
import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import RandomForestRegressor
with nirs4all.session(verbose=1, save_artifacts=True) as s:
# Compare models with shared configuration
pls_result = nirs4all.run(
pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
dataset="data/wheat.csv",
name="PLS",
session=s
)
rf_result = nirs4all.run(
pipeline=[MinMaxScaler(), RandomForestRegressor(n_estimators=100)],
dataset="data/wheat.csv",
name="RandomForest",
session=s
)
print(f"PLS: {pls_result.best_rmse:.4f} | RF: {rf_result.best_rmse:.4f}")
sklearn Integration with SHAP
import nirs4all
from nirs4all.sklearn import NIRSPipeline
import shap
# Train with nirs4all
result = nirs4all.run(pipeline, dataset)
# Wrap for sklearn compatibility
pipe = NIRSPipeline.from_result(result)
# Use with SHAP
explainer = shap.Explainer(pipe.predict, X_background)
shap_values = explainer(X_test)
shap.summary_plot(shap_values)
Pipeline Syntax
NIRS4ALL uses a declarative syntax for defining pipelines:
from nirs4all.operators.transforms import SNV, SavitzkyGolay, FirstDerivative
pipeline = [
# Preprocessing
MinMaxScaler(),
SNV(),
SavitzkyGolay(window_length=11, polyorder=2),
# Target scaling
{"y_processing": MinMaxScaler()},
# Cross-validation
ShuffleSplit(n_splits=5, test_size=0.2),
# Models to compare
{"model": PLSRegression(n_components=10)},
{"model": RandomForestRegressor(n_estimators=100)},
# Neural network with training parameters
{
"model": nicon,
"name": "NICON-CNN",
"train_params": {"epochs": 100, "patience": 20}
}
]
Advanced Features
# Feature augmentation - generate preprocessing combinations
{
"feature_augmentation": {
"_or_": [SNV, FirstDerivative, SavitzkyGolay],
"size": [1, (1, 2)],
"count": 5
}
}
# Hyperparameter optimization
{
"model": PLSRegression(),
"finetune_params": {
"n_trials": 50,
"model_params": {"n_components": ("int", 1, 30)}
}
}
# Branching for parallel preprocessing paths
{
"branch": [
[SNV(), PLSRegression(n_components=10)],
[MSC(), RandomForestRegressor()]
]
}
# Merge branch outputs (stacking)
{"merge": "predictions"}
Available Transforms
NIRS-Specific Preprocessing
| Transform | Description |
|---|---|
SNV / StandardNormalVariate |
Standard Normal Variate normalization |
RNV / RobustStandardNormalVariate |
Robust Normal Variate (outlier-resistant) |
MSC / MultiplicativeScatterCorrection |
Multiplicative Scatter Correction |
SavitzkyGolay |
Smoothing and derivative computation |
FirstDerivative / SecondDerivative |
Spectral derivatives |
NorrisWilliams |
Gap derivative with segment smoothing |
WaveletDenoise |
Multi-level wavelet denoising with thresholding |
OSC |
Orthogonal Signal Correction (DOSC) |
EPO |
External Parameter Orthogonalization |
Detrend |
Remove linear/polynomial trends |
Gaussian |
Gaussian smoothing |
Haar |
Haar wavelet decomposition |
Signal Processing
| Transform | Description |
|---|---|
Baseline |
Baseline correction (ALS, AirPLS, ArPLS, IModPoly, SNIP, etc.) |
ReflectanceToAbsorbance |
Convert R to A using Beer-Lambert |
ToAbsorbance / FromAbsorbance |
Signal type conversion |
KubelkaMunk |
Kubelka-Munk transform |
Resampler |
Wavelength interpolation |
CARS / MCUVE |
Feature selection methods |
Built-in NIRS Models
| Model | Description |
|---|---|
AOMPLSRegressor / AOMPLSClassifier |
Adaptive Operator-Mixture PLS — auto-selects best preprocessing |
POPPLSRegressor / POPPLSClassifier |
Per-Operator-Per-component PLS via PRESS |
PLSDA |
PLS Discriminant Analysis |
OPLS / OPLSDA |
Orthogonal PLS |
MBPLS |
Multi-Block PLS |
DiPLS |
Domain-Invariant PLS |
IKPLS |
Improved Kernel PLS |
FCKPLS |
Fractional Convolution Kernel PLS |
Splitting Methods
| Splitter | Description |
|---|---|
KennardStoneSplitter |
Kennard-Stone algorithm |
SPXYSplitter |
Sample set Partitioning based on X and Y |
SPXYFold / SPXYGFold |
SPXY-based K-Fold cross-validation (with group support) |
KMeansSplitter |
K-means clustering based split |
KBinsStratifiedSplitter |
Binned stratification for continuous targets |
See Preprocessing Guide for complete reference.
Examples
The examples/ directory is organized by topic:
User Examples (examples/user/)
| Category | Examples |
|---|---|
| Getting Started | Hello world, basic regression, classification, visualization |
| Data Handling | Multi-source, data loading, metadata |
| Preprocessing | SNV, MSC, derivatives, custom transforms |
| Models | Multi-model, hyperparameter tuning, stacking, PLS variants |
| Cross-Validation | KFold, group splits, nested CV |
| Deployment | Export, prediction, workspace management |
| Explainability | SHAP basics, sklearn integration, feature selection |
Reference Examples (examples/reference/)
Complete syntax reference and advanced pipeline patterns.
Run examples:
cd examples
./run.sh # Run all
./run.sh -i 1 # Run by index
./run.sh -n "U01*" # Run by pattern
Documentation
| Section | Description |
|---|---|
| User Guide | Preprocessing, API migration, augmentation |
| API Reference | Module-level API, sklearn integration, data handling |
| Specifications | Pipeline syntax, config format, metrics |
| Explanations | SHAP, resampling, SNV theory |
Full documentation: nirs4all.readthedocs.io
Research Applications
NIRS4ALL has been used in published research:
Houngbo, M. E., et al. (2024). Convolutional neural network allows amylose content prediction in yam (Dioscorea alata L.) flour using near infrared spectroscopy. Journal of the Science of Food and Agriculture, 104(8), 4915-4921. John Wiley & Sons, Ltd.
Citation
If you use NIRS4ALL in your research, please cite:
@software{beurier2025nirs4all,
author = {Gregory Beurier and Denis Cornet and Lauriane Rouan},
title = {NIRS4ALL: Open spectroscopy for everyone},
url = {https://github.com/GBeurier/nirs4all},
version = {0.7.1},
year = {2026},
}
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the CeCILL-2.1 License — a French free software license compatible with GPL.
Acknowledgments
- CIRAD for supporting this research
- The open-source scientific Python community
Made for the spectroscopy community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nirs4all-0.8.6.tar.gz.
File metadata
- Download URL: nirs4all-0.8.6.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a31dfce1dbf903eb4ac8a809843d5794664a7305dd8ece970a3703b52c08f62d
|
|
| MD5 |
5641ac735f591f79591dc3009fd6720e
|
|
| BLAKE2b-256 |
488e85466c506943ef1c1cc1c15fda16f0ddce4658bdb0e85425efb78205e2a9
|
Provenance
The following attestation bundles were made for nirs4all-0.8.6.tar.gz:
Publisher:
publish.yml on GBeurier/nirs4all
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nirs4all-0.8.6.tar.gz -
Subject digest:
a31dfce1dbf903eb4ac8a809843d5794664a7305dd8ece970a3703b52c08f62d - Sigstore transparency entry: 1206126694
- Sigstore integration time:
-
Permalink:
GBeurier/nirs4all@786fd7b778287e5b90af9287cc302503bacb3ac1 -
Branch / Tag:
refs/tags/0.8.6 - Owner: https://github.com/GBeurier
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@786fd7b778287e5b90af9287cc302503bacb3ac1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nirs4all-0.8.6-py3-none-any.whl.
File metadata
- Download URL: nirs4all-0.8.6-py3-none-any.whl
- Upload date:
- Size: 1.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbdcb1a4be967ece560c98ccfedeafc789d0418763fa8e79169b24b6d0b20ee7
|
|
| MD5 |
9d67862e39393a0e22a9aad9b5494991
|
|
| BLAKE2b-256 |
dbbb4cbfb206e215724aeb81452a6b71c965aec926fadfd90ab4fd7c55a0dee6
|
Provenance
The following attestation bundles were made for nirs4all-0.8.6-py3-none-any.whl:
Publisher:
publish.yml on GBeurier/nirs4all
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nirs4all-0.8.6-py3-none-any.whl -
Subject digest:
bbdcb1a4be967ece560c98ccfedeafc789d0418763fa8e79169b24b6d0b20ee7 - Sigstore transparency entry: 1206126707
- Sigstore integration time:
-
Permalink:
GBeurier/nirs4all@786fd7b778287e5b90af9287cc302503bacb3ac1 -
Branch / Tag:
refs/tags/0.8.6 - Owner: https://github.com/GBeurier
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@786fd7b778287e5b90af9287cc302503bacb3ac1 -
Trigger Event:
release
-
Statement type: