Skip to main content

A comprehensive spectral preprocessing toolkit for chemometrics

Project description

https://img.shields.io/pypi/v/spectoprep.svg https://img.shields.io/travis/habeeb3579/spectoprep.svg https://codecov.io/gh/habeeb3579/Spectoprep/graph/badge.svg?token=5EPSYE77K7 https://anaconda.org/habeebest/spectoprep/badges/version.svg Documentation Status

Spectroscopy preprocessing using Bayesian Optimization

Overview

SpectroPrep provides a toolkit for optimizing spectroscopic data preprocessing pipelines using Bayesian optimization. It automatically discovers the optimal combination of preprocessing techniques and their parameters to improve model performance for spectroscopic data analysis.

Features

  • Pipeline Optimization: Automate the discovery of optimal preprocessing pipelines using Bayesian optimization

  • Flexible Preprocessing: Choose from multiple preprocessing techniques (MSC, SNV, Savitzky-Golay, etc.)

  • Cross-Validation Support: Group-based cross-validation methods for robust evaluation

  • Configurable Pipeline Length: Control maximum preprocessing steps and allowed combinations

Installation

pip install spectoprep

Quick Start

from spectroprep.pipeline.optimizer import PipelineOptimizer
import numpy as np

# Prepare your data
X_train = np.array(...)  # Your spectral data matrix
y_train = np.array(...)  # Your target values
groups = np.array(...)   # Optional group labels for cross-validation

# Initialize the optimizer
optimizer = PipelineOptimizer(
    X_train=X_train,
    y_train=y_train,
    X_test=None,
    y_test=None,
    preprocessing_steps=['msc', 'savgol', 'detrend', 'scaler', 'snv',
                          'robust_scaler', 'emsc', 'meancn'],
    cv_method="group_shuffle_split",
    n_splits=3,
    random_state=21,
    groups=groups,
    max_pipeline_length=2,
    allowed_preprocess_combinations=[1, 2]
)

# Run Bayesian optimization to find the best pipeline
best_params, best_pipeline = optimizer.bayesian_optimize(
    init_points=50,
    n_iter=1000
)

# Extract preprocessing steps without the final model
custom_preprocessing = []
for name, step in best_pipeline.steps[:-1]:
    custom_preprocessing.append((name, step))

# Print optimization summary
summary = optimizer.summarize_optimization()
print(f"Best pipeline configuration: {summary['best_pipeline']}")
print(f"Best RMSE: {summary['best_rmse']:.4f}")

# Make predictions with the optimized pipeline
predictions, rmse, r2 = optimizer.get_best_pipeline_predictions(best_pipeline)

Available Preprocessing Methods

  • msc: Multiplicative Scatter Correction

  • savgol: Savitzky-Golay filtering

  • detrend: Linear detrending

  • scaler: Standard scaling

  • snv: Standard Normal Variate

  • robust_scaler: Robust scaling

  • emsc: Extended Multiplicative Signal Correction

  • meancn: Mean centering

  • pca: Principal Component Analysis

  • select_k_best: Feature selection

Documentation

For detailed documentation, visit spectoprep.readthedocs.io.

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectoprep-1.0.0a0.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spectoprep-1.0.0a0-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file spectoprep-1.0.0a0.tar.gz.

File metadata

  • Download URL: spectoprep-1.0.0a0.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spectoprep-1.0.0a0.tar.gz
Algorithm Hash digest
SHA256 e9af0827537657f99347791b935c5489af3dc8ad5dbc8bd4aa9dadd00d91456e
MD5 c590080c4be0bf31f3ebe68a0cf4c5b7
BLAKE2b-256 1e4bcd8373bf81d65785ddf558477559d8de64b599c4a825c678d49c014d3e2f

See more details on using hashes here.

File details

Details for the file spectoprep-1.0.0a0-py3-none-any.whl.

File metadata

  • Download URL: spectoprep-1.0.0a0-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spectoprep-1.0.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 85e233b9286ae6d75a9639b3fc2b857ea249ef4b2df45ba4ab8872cc39f3a635
MD5 dc30a0b6b9258178dc83c9f2ba039539
BLAKE2b-256 a489a9f98c5a52964039f1ed61de318c861289c8a35ff770aa60c310273eb49a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page