Skip to main content

NIRS Analyses made easy.

Project description

NIRS4ALL Logo

PyPI version Python 3.11+ License: CECILL-2.1

NIRS4ALL is a comprehensive machine learning library specifically designed for Near-Infrared Spectroscopy (NIRS) data analysis. It bridges the gap between spectroscopic data and machine learning by providing a unified framework for data loading, preprocessing, model training, and evaluation.

What is Near-Infrared Spectroscopy (NIRS)?

Near-Infrared Spectroscopy (NIRS) is a rapid and non-destructive analytical technique that uses the near-infrared region of the electromagnetic spectrum (approximately 700-2500 nm). NIRS measures how near-infrared light interacts with the molecular bonds in materials, particularly C-H, N-H, and O-H bonds, providing information about the chemical composition of samples.

Key advantages of NIRS:

  • Non-destructive analysis
  • Minimal sample preparation
  • Rapid results (seconds to minutes)
  • Potential for on-line/in-line implementation
  • Simultaneous measurement of multiple parameters

Common applications:

  • Agriculture: soil analysis, crop quality assessment
  • Food industry: quality control, authenticity verification
  • Pharmaceutical: raw material verification, process monitoring
  • Medical: tissue monitoring, brain imaging
  • Environmental: pollutant detection, water quality monitoring

Notes:

NIRS4All is in active development; APIs and docs are subject to change. Pre-1.0 notice: interfaces and documentation may change without notice.

Features

NIRS4ALL offers a wide range of functionalities:

  1. Spectrum Preprocessing:

    • Baseline correction
    • Standard normal variate (SNV)
    • Robust normal variate
    • Savitzky-Golay filtering
    • Normalization
    • Detrending
    • Multiplicative scatter correction
    • Derivative computation
    • Gaussian filtering
    • Haar wavelet transformation
    • And more...
  2. Data Splitting Methods:

    • Kennard Stone
    • SPXY
    • Random sampling
    • Stratified sampling
    • K-means
    • And more...
  3. Model Integration:

    • Scikit-learn models
    • TensorFlow/Keras models
    • Pre-configured neural networks dedicated to the NIRS: nicon & decon (see publication below)
    • PyTorch models (via extensions)
    • JAX models (via extensions)
  4. Model Fine-tuning:

    • Hyperparameter optimization with Optuna
    • Grid search and random search
    • Cross-validation strategies
  5. Visualization:

    • Preprocessing effect visualization
    • Model performance visualization
    • Feature importance analysis
    • Classification metrics
    • Residual analysis
Performance Heatmap Performance Distribution
Advanced visualization capabilities for model performance analysis

Installation

Basic Installation

pip install nirs4all

This installs the core library with scikit-learn support. Deep learning frameworks are optional.

With Additional ML Frameworks

# With TensorFlow support (CPU)
pip install nirs4all[tensorflow]

# With TensorFlow support (GPU)
pip install nirs4all[gpu]

# With PyTorch support
pip install nirs4all[torch]

# With Keras support
pip install nirs4all[keras]

# With JAX support
pip install nirs4all[jax]

# With all ML frameworks
pip install nirs4all[all]

# With all ML frameworks and GPU support for TensorFlow
pip install nirs4all[all-gpu]

Development Installation

For developers who want to contribute:

git clone https://github.com/gbeurier/nirs4all.git
cd nirs4all
pip install -e .[dev]

Installation Testing

After installing nirs4all, you can verify your installation and environment using the built-in CLI test commands:

# Basic installation test: checks required dependencies and versions
nirs4all --test-install

# Integration test: runs sklearn, tensorflow, and optuna pipelines on sample data
nirs4all --test-integration

# Check version
nirs4all --version

Each command will print a summary of the test results and alert you to any missing dependencies or issues with your environment.

Quick Start

Basic Pipeline Example

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import RandomForestRegressor

from nirs4all.data import DatasetConfigs
from nirs4all.pipeline import PipelineConfigs, PipelineRunner
from nirs4all.operators.transforms import (
    StandardNormalVariate, SavitzkyGolay, MultiplicativeScatterCorrection
)

# Define your processing pipeline
pipeline = [
    MinMaxScaler(),                    # Scale features
    StandardNormalVariate(),           # Apply SNV transformation
    ShuffleSplit(n_splits=3),         # 3-fold cross-validation
    {"y_processing": MinMaxScaler()}, # Scale target values
    {"model": PLSRegression(n_components=10)},
    {"model": RandomForestRegressor(n_estimators=100)},
]

# Create configurations
pipeline_config = PipelineConfigs(pipeline, name="MyPipeline")
dataset_config = DatasetConfigs("path/to/your/data")

# Run the pipeline
runner = PipelineRunner(save_files=False, verbose=1)
predictions, predictions_per_datasets = runner.run(pipeline_config, dataset_config)

# Analyze results
top_models = predictions.top(n=5, rank_metric='rmse')
print("Top 5 models by RMSE:")
for i, model in enumerate(top_models):
    print(f"{i+1}. {model['model_name']}: RMSE = {model['rmse']:.4f}")

Advanced Pipeline with Feature Augmentation

from nirs4all.operators.transforms import (
    Detrend, FirstDerivative, Gaussian, Haar
)

# Define multiple preprocessing options
preprocessors = [Detrend, FirstDerivative, Gaussian, StandardNormalVariate]

# Advanced pipeline with feature augmentation
pipeline = [
    "chart_2d",  # Generate visualization
    MinMaxScaler(),
    {"y_processing": MinMaxScaler()},
    {
        "feature_augmentation": {
            "_or_": preprocessors,
            "size": [1, (1, 2)],  # Single and paired transformations
            "count": 7           # Generate 7 different combinations
        }
    },
    ShuffleSplit(n_splits=3, test_size=0.25),
]

# Add multiple PLS models with different components
for n_comp in range(5, 31, 5):
    pipeline.append({
        "name": f"PLS-{n_comp}_components",
        "model": PLSRegression(n_components=n_comp)
    })

# Run and analyze
pipeline_config = PipelineConfigs(pipeline, "AdvancedPipeline")
runner = PipelineRunner(save_files=False)
predictions, _ = runner.run(pipeline_config, dataset_config)

Neural Network Integration

from nirs4all.operators.models.tensorflow.nicon import nicon

# Pipeline with pre-configured neural network
pipeline = [
    MinMaxScaler(),
    StandardNormalVariate(),
    ShuffleSplit(n_splits=3),
    {"y_processing": MinMaxScaler()},
    {"model": PLSRegression(n_components=15)},
    {
        "model": nicon,  # Pre-configured convolutional neural network
        "name": "NICON-CNN",
        "train_params": {
            "epochs": 100,
            "patience": 20,
            "verbose": 1
        }
    }
]

pipeline_config = PipelineConfigs(pipeline, "NeuralNetworkPipeline")
runner = PipelineRunner(save_files=False, verbose=1)
predictions, _ = runner.run(pipeline_config, dataset_config)

# Compare neural network with traditional models
top_models = predictions.top(n=3, rank_metric='rmse')
for i, model in enumerate(top_models):
    print(f"{i+1}. {model['model_name']}: RMSE = {model['rmse']:.4f}")

Hyperparameter Optimization

# Pipeline with automated hyperparameter tuning
pipeline = [
    MinMaxScaler(),
    StandardNormalVariate(),
    ShuffleSplit(n_splits=3),
    {"y_processing": MinMaxScaler()},
    {
        "model": PLSRegression(),
        "name": "PLS-Optimized",
        "finetune_params": {
            "n_trials": 50,
            "verbose": 1,
            "approach": "single",  # "grouped" or "single"
            "model_params": {
                'n_components': ('int', 1, 30),
            },
        }
    }
]

pipeline_config = PipelineConfigs(pipeline, "OptimizedPipeline")
runner = PipelineRunner(save_files=False, verbose=1)
predictions, _ = runner.run(pipeline_config, dataset_config)

# Get the best optimized model
best_model = predictions.top(n=1, rank_metric='rmse')[0]
print(f"Best model: {best_model['model_name']} with RMSE: {best_model['rmse']:.4f}")

Visualization and Analysis

from nirs4all.data.prediction_analyzer import PredictionAnalyzer
import matplotlib.pyplot as plt

# Create analyzer for your predictions
analyzer = PredictionAnalyzer(predictions)

# Plot top performing models
fig1 = analyzer.plot_top_k_comparison(k=5, rank_metric='rmse')
plt.title('Top 5 Models Comparison')

# Create heatmap of model performance across preprocessing methods
fig2 = analyzer.plot_variable_heatmap(
    x_var="model_name",
    y_var="preprocessings",
    metric='rmse'
)
plt.title('Model Performance Heatmap')

# Candlestick plot for model variability
fig3 = analyzer.plot_variable_candlestick(
    filters={"partition": "test"},
    variable="model_name"
)
plt.title('Model Performance Variability')

plt.show(block=False)

Tutorials

NIRS4ALL provides comprehensive tutorials to help you master NIRS data analysis:

🚀 Tutorial 1: Beginner's Guide

Perfect for getting started with NIRS4ALL! This tutorial covers:

  • Basic PLS Regression - Your first NIRS pipeline
  • Enhanced Preprocessing - Spectral data preprocessing techniques
  • Classification - Random Forest classification examples
  • Model Persistence - Save and reuse trained models
  • Multiple Datasets - Cross-dataset validation and analysis
  • Data Visualization - Create meaningful plots and charts

Start here if you're new to NIRS analysis or the NIRS4ALL framework.

🔬 Tutorial 2: Advanced Analysis

For experienced users ready for sophisticated techniques:

  • Multi-Source Analysis - Multi-target regression with single datasets
  • Hyperparameter Optimization - Automated model tuning with Optuna
  • Custom Components - Build your own transformers and models
  • Configuration Generation - Dynamic pipeline customization
  • Advanced Visualizations - Professional-grade analysis dashboards
  • Neural Networks - Deep learning with pre-configured models (nicon, decon)
  • Complete Workflows - End-to-end professional analysis

These tutorials demonstrate real-world workflows and best practices for production-ready NIRS analysis.

Examples

Ready-to-run example scripts demonstrating common NIRS workflows:

Basic Examples

  • Q1_regression.py - Basic regression with PLS models and preprocessing combinations
  • Q1_classif.py - Classification pipeline with Random Forest and preprocessing
  • Q1_classif_tf.py - Classification with TensorFlow neural networks and confusion matrix visualization
  • Q1_groupsplit.py - Group-based data splitting for maintaining sample integrity

Advanced Pipeline Techniques

Model Deployment & Prediction

Data Processing & Analysis

Custom Models

  • custom_NN.py - Custom TensorFlow neural network architectures for NIRS
  • custom_nicon.py - Custom NICON (NIRS Convolutional Network) model implementations

Run any example with: python examples/<example_name>.py t

Documentation

User Guide

API Reference

Specifications

Explanations

Reference

Full documentation will be available at https://nirs4all.readthedocs.io/

Dependencies

  • numpy (>=1.20.0)
  • pandas (>=1.0.0)
  • scipy (>=1.5.0)
  • scikit-learn (>=0.24.0)
  • PyWavelets (>=1.1.0)
  • joblib (>=0.16.0)
  • jsonschema (>=3.2.0)
  • kennard-stone (>=0.5.0)
  • twinning (>=0.0.5)
  • optuna (>=2.0.0)

Optional Dependencies

  • tensorflow (>=2.10.0) - For TensorFlow models
  • torch (>=2.0.0) - For PyTorch models
  • keras (>=3.0.0) - For Keras models
  • jax (>=0.4.10) & jaxlib (>=0.4.10) - For JAX models

Research Applications

NIRS4ALL has been successfully used in published research:

Houngbo, M. E., Desfontaines, L., Diman, J. L., Arnau, G., Mestres, C., Davrieux, F., Rouan, L., Beurier, G., Marie‐Magdeleine, C., Meghar, K., Alamu, E. O., Otegbayo, B. O., & Cornet, D. (2024). Convolutional neural network allows amylose content prediction in yam (Dioscorea alata L.) flour using near infrared spectroscopy. Journal of the Science of Food and Agriculture, 104(8), 4915-4921. John Wiley & Sons, Ltd.

How to Cite

If you use NIRS4ALL in your research, please cite:

@software{beurier2025nirs4all,
  author = {Gregory Beurier and Denis Cornet and Camille Noûs and Lauriane Rouan},
  title = {nirs4all is all your nirs: Open spectroscopy for everyone},
  url = {https://github.com/gbeurier/nirs4all},
  version = {0.2.1},
  year = {2025},
}

License

This project is licensed under the CECILL-2.1 License - see the LICENSE file for details.

Acknowledgments

  • CIRAD for supporting this research
  • [LLMs] for providing fast documentation, nice charts, emojis in logs 😭, and plenty of useless tests, booby-trapped source code, and misleading specifications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nirs4all-0.4.2.tar.gz (382.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nirs4all-0.4.2-py3-none-any.whl (506.5 kB view details)

Uploaded Python 3

File details

Details for the file nirs4all-0.4.2.tar.gz.

File metadata

  • Download URL: nirs4all-0.4.2.tar.gz
  • Upload date:
  • Size: 382.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nirs4all-0.4.2.tar.gz
Algorithm Hash digest
SHA256 273bccad448003e53c38cf0bf745ebc286535be124218fa5ec48d41f1df7d2b3
MD5 281ea271ebb066238a610e30b57a757c
BLAKE2b-256 05b02eeba87b4e930148f85b32d034ff2f0311ede7bbabc4c56cd0db0d49427d

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.4.2.tar.gz:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nirs4all-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: nirs4all-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 506.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nirs4all-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0be752459491f6db57c6da1a5abe331fbf83235aa7a1e61e2b4b8fddb44eb9ef
MD5 d7cd3e6059751ac06b533c469963868e
BLAKE2b-256 356f48e08b8b0f14eaf13f074544563142afc3f7893997353bc3100dab40fc01

See more details on using hashes here.

Provenance

The following attestation bundles were made for nirs4all-0.4.2-py3-none-any.whl:

Publisher: publish.yml on GBeurier/nirs4all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page