A framework for designing and evaluating optimal preprocessing pipelines for FTIR spectral data used in classification tasks. It provides modular implementations of common preprocessing techniques and allows automated exploration of preprocessing combinations to enhance model performance.

These details have not been verified by PyPI

Project links

Project description

FTIR-Prep: FTIR Preprocessing Framework

A modular and extensible framework for optimizing FTIR preprocessing pipelines for disease diagnosis.

🚀 Features

Modular: Component-based reusable architecture
Extensible: Easy addition of new preprocessing techniques
Automatic Optimization: Optuna integration for optimal preprocessing pipeline search
Robust Validation: Support for group-based cross-validation
Configurable: Flexible pipeline configuration system
Documented: Complete documentation with practical examples

📋 Supported Preprocessing Techniques

🔧 Baseline Correction

Rubberband: Automatic correction using rubberband algorithm
Polynomial: Correction using configurable order polynomials (1-6)
Whittaker: Penalized least squares smoothing with lambda parameter
ALS: Asymmetric Least Squares with lambda and p parameters
ArPLS: Adaptive reweighted penalized least squares
DrPLS: Doubly reweighted penalized least squares
GCV Spline: Generalized cross-validation spline smoothing
Gaussian Process: Baseline correction using Gaussian processes

📊 Normalization

Min-Max: Individual Min-Max spectrum normalization
Vector: L1, L2, or maximum normalization
Amida I: Normalization based on amide I band peak (1600-1700 cm⁻¹)
Area: Area under curve normalization

🎯 Smoothing

Savitzky-Golay: Polynomial filter with configurable parameters
Wavelets: Denoising using Daubechies wavelets (db2, db3, db4)
Local Polynomial: LOWESS smoothing with configurable bandwidth
Whittaker: Penalized least squares smoothing
GCV Spline: Generalized cross-validation spline smoothing
Flat: Flat window convolution smoothing
Hanning: Hanning window convolution smoothing

📈 Derivatives

First Derivative: First derivative calculation via Savitzky-Golay (order 1)
Second Derivative: Second derivative calculation via Savitzky-Golay (order 2)

✂️ Wavelength Truncation

Fingerprint Region: Keep only fingerprint region (900-1800 cm⁻¹)
Fingerprint + Lipids: Keep fingerprint and lipids regions (900-1800, 2800-3050 cm⁻¹)

🔍 Model Explainability

SHAP Analysis: Feature importance analysis using SHAP values

🏗️ Architecture

ftir_prep/
├── core/                    # Core functionalities
│   ├── pipeline.py         # Preprocessing pipeline
│   ├── evaluator.py        # Pipeline evaluation
│   └── explainer.py        # SHAP explainability analysis
├── preprocessing/           # Preprocessing techniques
│   ├── baseline.py         # Baseline correction
│   ├── normalization.py    # Normalization
│   ├── smoothing.py        # Smoothing
│   ├── derivatives.py      # Derivative calculation
│   └── truncation.py      # Wavelength truncation
├── optimization/            # Automatic optimization
│   └── optuna_optimizer.py # Optuna integration
├── utils/                   # Utilities
│   └── data_loader.py      # Data loading
└── config/                  # Configurations
    └── settings.py         # Default parameters

🚀 Installation

Requirements

Python 3.8+
pip (usually included with Python)

Installation via PyPI (Recommended - Simplest)

pip install ftir-prep

📖 Basic Usage

1. Data Loading

1.1 Separates into groups to guarantee that data from the same patient will be in the same fold in a future classification task

from ftir_prep import FTIRDataLoader

# Load FTIR data
data_loader = FTIRDataLoader(
    data_path="ftir_data.dat",
    wavenumbers_path="wavenumbers.dat"
)

X, y, wavenumbers = data_loader.load_data()

# Create groups var that will be used in classification task to indicate that patient's data must be in the same fold
groups = data_loader.create_groups(instances_per_group=3)

1.2 Slices the data to use only one spectra per patient. Data must be ordered by patient

from ftir_prep import FTIRDataLoader

# Load FTIR data
data_loader = FTIRDataLoader(
    data_path="ftir_data.dat",
    wavenumbers_path="wavenumbers.dat"
)

X, y, wavenumbers = data_loader.load_data(slice_size = 3) #use one of the triplicated spectra per patient

2. Pipeline Creation

from ftir_prep import FTIRPipeline, PipelineBuilder

# Using direct configuration
pipeline = FTIRPipeline()
pipeline.add_step('truncation', 'fingerprint_lipids')
pipeline.add_step('baseline', 'polynomial', polynomial_order=2)
pipeline.add_step('normalization', 'vector')


# Using PipelineBuilder (Fluent API)
pipeline = (PipelineBuilder()
            .add_truncation('fingerprint_lipids')
            .add_baseline('rubberband')
            .add_normalization('minmax')
            .add_smoothing('savgol', polyorder=2)
            .add_derivative('savgol',order=1)
            .build())

3. Execution and Evaluation

from ftir_prep import PipelineEvaluator

# Process data
X_processed, wavenumbers_processed = pipeline.process(X, wavenumbers)

# Evaluate pipeline
evaluator = PipelineEvaluator(classifier=None, # use default Random Forest
                              cv_method='StratifiedGroupKFold', # cross-validation strategy
                              cv_params={'n_splits': 3, #folds
                                        'shuffler': False,
                                        'random_state': 42})
results = evaluator.evaluate_pipeline(pipeline,
                                      X, y,
                                      groups, # groups var created previously
                                      wavenumbers=wavenumbers
)

print(f"Accuracy: {results['mean_accuracy']:.4f} ± {results['std_accuracy']:.4f}")

4. Automatic Optimization

from ftir_prep import OptunaPipelineOptimizer

# Automatically optimize parameters
optimizer = OptunaPipelineOptimizer(X, y, 
                                    wavenumbers,
                                    groups,
                                    evaluator=evaluator, # previously configured PipelineEvaluator object
                                    metric='f1_macro')
study = optimizer.optimize(n_trials=30)

best_pipeline = optimizer.best_pipeline
best_pipeline.save_pipeline("best_pipeline_found.json") # Saves the best pipeline found in a json file
print("Best pipeline saved to 'best_pipeline_found.json'")

# Save optimization metadata
metadata = optimizer.get_metadata()
metadata.to_csv("optimization_metadata.csv")

5. Model Explainability

from ftir_prep import FTIRExplainer

# Create explainer
explainer = FTIRExplainer(classifier=your_classifier)

# Analyze feature importance with SHAP
# It will save in output_dir a csv and a png with feature importance data
results = explainer.explain_model(
    X_processed, y, groups,
    split_method='stratified_group',
    feature_names=wavenumbers_processed,
    output_dir="shap_analysis"
)

🔬 Practical Examples

Pipeline Creation Examples

# Direct configuration example
python3 examples/create_pipeline/direct_configuration.py

# PipelineBuilder (Fluent API) example
python3 examples/create_pipeline/pipeline_builder.py

Pipeline Comparison Example

# Compare different preprocessing strategies
python3 examples/compare_pipelines/compare_pipelines.py

Pipeline Optimization Example

# Automatic pipeline optimization
python3 examples/pipeline_search/pipeline_search.py

Pipeline Loading Example

# Load and use saved pipelines
python3 examples/read_pipeline_from_file/read_pipeline_file.py

SHAP Explainability Example

# Feature importance analysis with SHAP
python3 examples/shap_analysis/explainer_example.py

🎯 Use Cases

Disease Diagnosis

Analysis of FTIR spectra from biological samples
Biomarker identification
Automatic sample classification

Scientific Research

Methodology comparison
Protocol optimization
Result validation

📚 Documentation

Docstrings: Complete inline documentation
Examples: Functional example code

👥 Authors

Lucas Mendonça - Initial development - GitHub

⭐ If this project was useful to you, consider giving it a star on GitHub!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

Apr 20, 2026

0.3.0

Apr 20, 2026

0.2.3

Apr 20, 2026

0.2.2

Apr 19, 2026

0.2.1

Apr 11, 2026

0.2.0

Apr 11, 2026

0.1.0

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftir_prep-0.3.1.tar.gz (1.5 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ftir_prep-0.3.1-py3-none-any.whl (45.1 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file ftir_prep-0.3.1.tar.gz.

File metadata

Download URL: ftir_prep-0.3.1.tar.gz
Upload date: Apr 20, 2026
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ftir_prep-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`b78c1044358f8f67f0b6342d328aad424a783e3cce53a09104f95f8967dad84d`
MD5	`30a7c0c99969a9c405843d335fa9dfca`
BLAKE2b-256	`30b1bba3616a13d86eb436aacb7370dceb291ea0740717265f05bc993a52b32e`

See more details on using hashes here.

File details

Details for the file ftir_prep-0.3.1-py3-none-any.whl.

File metadata

Download URL: ftir_prep-0.3.1-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 45.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ftir_prep-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6890a2ad31808291483b01384334d1b72b85a05f99ef8ded41775dbdd2d83efe`
MD5	`fa5cfacbf33b29747ddf0a903c7fae51`
BLAKE2b-256	`a968de5e28a4ad3d641e7be7d4cf5b0457a556932b169c9d37c8b6161d5bd092`

See more details on using hashes here.

ftir-prep 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FTIR-Prep: FTIR Preprocessing Framework

🚀 Features

📋 Supported Preprocessing Techniques

🔧 Baseline Correction

📊 Normalization

🎯 Smoothing

📈 Derivatives

✂️ Wavelength Truncation

🔍 Model Explainability

🏗️ Architecture

🚀 Installation

Requirements

Installation via PyPI (Recommended - Simplest)

📖 Basic Usage

1. Data Loading

1.1 Separates into groups to guarantee that data from the same patient will be in the same fold in a future classification task

1.2 Slices the data to use only one spectra per patient. Data must be ordered by patient

2. Pipeline Creation

3. Execution and Evaluation

4. Automatic Optimization

5. Model Explainability

🔬 Practical Examples

Pipeline Creation Examples

Pipeline Comparison Example

Pipeline Optimization Example

Pipeline Loading Example

SHAP Explainability Example

🎯 Use Cases

Disease Diagnosis

Scientific Research

📚 Documentation

👥 Authors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes