Skip to main content

A comprehensive spectral preprocessing toolkit for chemometrics

Project description

https://img.shields.io/pypi/v/spectoprep.svg https://img.shields.io/travis/habeeb3579/spectoprep.svg https://codecov.io/gh/habeeb3579/Spectoprep/graph/badge.svg?token=5EPSYE77K7 https://anaconda.org/habeebest/spectoprep/badges/version.svg Documentation Status

Spectroscopy preprocessing using Bayesian Optimization

Overview

SpectoPrep provides a toolkit for optimizing spectroscopic data preprocessing pipelines using Bayesian optimization. It automatically discovers the optimal combination of preprocessing techniques and their parameters to improve model performance for spectroscopic data analysis.

Features

  • Pipeline Optimization: Automate the discovery of optimal preprocessing pipelines using Bayesian optimization

  • Flexible Preprocessing: Choose from multiple preprocessing techniques (MSC, SNV, Savitzky-Golay, etc.)

  • Cross-Validation Support: Group-based cross-validation methods for robust evaluation

  • Configurable Pipeline Length: Control maximum preprocessing steps and allowed combinations

Installation

pip install spectoprep

Quick Start

from spectoprep.pipeline.optimizer import PipelineOptimizer
import numpy as np

# Prepare your data
X_train = np.array(...)  # Your spectral data matrix
y_train = np.array(...)  # Your target values
groups = np.array(...)   # Optional group labels for cross-validation

# Initialize the optimizer
optimizer = PipelineOptimizer(
    X_train=X_train,
    y_train=y_train,
    X_test=None,
    y_test=None,
    preprocessing_steps=['msc', 'savgol', 'detrend', 'scaler', 'snv',
                          'robust_scaler', 'emsc', 'meancn'],
    cv_method="group_shuffle_split",
    n_splits=3,
    random_state=21,
    groups=groups,
    max_pipeline_length=2,
    allowed_preprocess_combinations=[1, 2]
)

# Run Bayesian optimization to find the best pipeline
best_params, best_pipeline = optimizer.bayesian_optimize(
    init_points=50,
    n_iter=1000
)

# Extract preprocessing steps without the final model
custom_preprocessing = []
for name, step in best_pipeline.steps[:-1]:
    custom_preprocessing.append((name, step))

# Print optimization summary
summary = optimizer.summarize_optimization()
print(f"Best pipeline configuration: {summary['best_pipeline']}")
print(f"Best RMSE: {summary['best_rmse']:.4f}")

# Make predictions with the optimized pipeline
predictions, rmse, r2 = optimizer.get_best_pipeline_predictions(best_pipeline)

Available Preprocessing Methods

  • msc: Multiplicative Scatter Correction

  • savgol: Savitzky-Golay filtering

  • detrend: Linear detrending

  • scaler: Standard scaling

  • snv: Standard Normal Variate

  • robust_scaler: Robust scaling

  • emsc: Extended Multiplicative Signal Correction

  • meancn: Mean centering

  • pca: Principal Component Analysis

  • select_k_best: Feature selection

Documentation

For detailed documentation, visit spectoprep.readthedocs.io.

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectoprep-1.0.1a0.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spectoprep-1.0.1a0-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file spectoprep-1.0.1a0.tar.gz.

File metadata

  • Download URL: spectoprep-1.0.1a0.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spectoprep-1.0.1a0.tar.gz
Algorithm Hash digest
SHA256 b0ecc99ea1d95349b31f41ca0a3b731c78f43fcadef7681fa006699235de676b
MD5 7df08218137accb9af1e64571cdfa408
BLAKE2b-256 a7b452b4ee749771cdb56c5548a99bd9ebed747b5db8f2239b69e6a2cfaf9768

See more details on using hashes here.

File details

Details for the file spectoprep-1.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: spectoprep-1.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spectoprep-1.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 44713dcf372c3556e6d92039d4bc7860827f85d5718c63f1108e03928173fb14
MD5 c96da8b68ca73c5eb86975a27f8e5581
BLAKE2b-256 bd689d129092653b54dfcf75422b7421d6fd226dc578cce0fa010d8a9a2549c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page