A comprehensive spectral preprocessing toolkit for chemometrics
Project description
Spectroscopy preprocessing using Bayesian Optimization
Overview
SpectoPrep provides a toolkit for optimizing spectroscopic data preprocessing pipelines using Bayesian optimization. It automatically discovers the optimal combination of preprocessing techniques and their parameters to improve model performance for spectroscopic data analysis.
Features
Pipeline Optimization: Automate the discovery of optimal preprocessing pipelines using Bayesian optimization
Flexible Preprocessing: Choose from multiple preprocessing techniques (MSC, SNV, Savitzky-Golay, etc.)
Cross-Validation Support: Group-based cross-validation methods for robust evaluation
Configurable Pipeline Length: Control maximum preprocessing steps and allowed combinations
Installation
pip install spectoprep
Quick Start
from spectoprep.pipeline.optimizer import PipelineOptimizer
import numpy as np
# Prepare your data
X_train = np.array(...) # Your spectral data matrix
y_train = np.array(...) # Your target values
groups = np.array(...) # Optional group labels for cross-validation
# Initialize the optimizer
optimizer = PipelineOptimizer(
X_train=X_train,
y_train=y_train,
X_test=None,
y_test=None,
preprocessing_steps=['msc', 'savgol', 'detrend', 'scaler', 'snv',
'robust_scaler', 'emsc', 'meancn'],
cv_method="group_shuffle_split",
n_splits=3,
random_state=21,
groups=groups,
max_pipeline_length=2,
allowed_preprocess_combinations=[1, 2]
)
# Run Bayesian optimization to find the best pipeline
best_params, best_pipeline = optimizer.bayesian_optimize(
init_points=50,
n_iter=1000
)
# Extract preprocessing steps without the final model
custom_preprocessing = []
for name, step in best_pipeline.steps[:-1]:
custom_preprocessing.append((name, step))
# Print optimization summary
summary = optimizer.summarize_optimization()
print(f"Best pipeline configuration: {summary['best_pipeline']}")
print(f"Best RMSE: {summary['best_rmse']:.4f}")
# Make predictions with the optimized pipeline
predictions, rmse, r2 = optimizer.get_best_pipeline_predictions(best_pipeline)
Available Preprocessing Methods
msc: Multiplicative Scatter Correction
savgol: Savitzky-Golay filtering
detrend: Linear detrending
scaler: Standard scaling
snv: Standard Normal Variate
robust_scaler: Robust scaling
emsc: Extended Multiplicative Signal Correction
meancn: Mean centering
pca: Principal Component Analysis
select_k_best: Feature selection
Documentation
For detailed documentation, visit spectoprep.readthedocs.io.
Contributing
We welcome contributions! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spectoprep-1.0.1a0.tar.gz.
File metadata
- Download URL: spectoprep-1.0.1a0.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0ecc99ea1d95349b31f41ca0a3b731c78f43fcadef7681fa006699235de676b
|
|
| MD5 |
7df08218137accb9af1e64571cdfa408
|
|
| BLAKE2b-256 |
a7b452b4ee749771cdb56c5548a99bd9ebed747b5db8f2239b69e6a2cfaf9768
|
File details
Details for the file spectoprep-1.0.1a0-py3-none-any.whl.
File metadata
- Download URL: spectoprep-1.0.1a0-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44713dcf372c3556e6d92039d4bc7860827f85d5718c63f1108e03928173fb14
|
|
| MD5 |
c96da8b68ca73c5eb86975a27f8e5581
|
|
| BLAKE2b-256 |
bd689d129092653b54dfcf75422b7421d6fd226dc578cce0fa010d8a9a2549c7
|