Python toolkit for spectral data processing: format parsers, baseline correction, normalization, and similarity matching.
Project description
SpectraKit
Python toolkit for spectral data processing: smoothing, baseline correction, normalization, scatter correction, derivatives, peak analysis, and more.
SpectraKit is a lightweight, pip-installable library for preprocessing and analyzing spectral data from IR, Raman, and NIR spectroscopy. It follows a functional design with NumPy arrays as the primary data type and requires only NumPy + SciPy as core dependencies.
Documentation | API Reference | Examples
Installation
pip install pyspectrakit
Note: The PyPI distribution name is
pyspectrakit(due to a naming conflict). The import name is simplyimport spectrakit.
Optional extras for additional functionality:
pip install pyspectrakit[io] # HDF5 file support
pip install pyspectrakit[cli] # Command-line interface
pip install pyspectrakit[baselines] # pybaselines backend (200+ methods)
pip install pyspectrakit[fitting] # lmfit peak fitting
pip install pyspectrakit[sklearn] # scikit-learn integration
pip install pyspectrakit[plot] # Plotting utilities
pip install pyspectrakit[all] # Everything above
Quick Start
import numpy as np
from spectrakit import smooth_savgol, baseline_als, normalize_snv
# Load your spectral data (N spectra, W wavelengths)
spectra = np.loadtxt("data.csv", delimiter=",")
# Process with individual functions
smoothed = smooth_savgol(spectra, window_length=11)
corrected = baseline_als(smoothed, lam=1e6, p=0.01)
normalized = normalize_snv(corrected)
All functions accept both single spectra (W,) and batches (N, W).
Pipeline
Chain steps for reproducibility:
from spectrakit.pipeline import Pipeline
pipe = Pipeline()
pipe.add("smooth", smooth_savgol, window_length=11)
pipe.add("baseline", baseline_als, lam=1e6)
pipe.add("normalize", normalize_snv)
processed = pipe.transform(spectra)
scikit-learn Integration
Use any SpectraKit function in an sklearn pipeline:
from sklearn.pipeline import Pipeline as SkPipeline
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from spectrakit.sklearn import SpectralTransformer
pipe = SkPipeline([
("smooth", SpectralTransformer(smooth_savgol, window_length=11)),
("baseline", SpectralTransformer(baseline_als, lam=1e6)),
("normalize", SpectralTransformer(normalize_snv)),
("pca", PCA(n_components=10)),
("svm", SVC()),
])
pipe.fit(X_train, y_train)
predictions = pipe.predict(X_test)
Features
Smoothing
| Method | Function | Description |
|---|---|---|
| Savitzky-Golay | smooth_savgol(y) |
Polynomial least-squares smoothing |
| Whittaker | smooth_whittaker(y) |
Penalized least-squares smoother |
Baseline Correction
| Method | Function | Description |
|---|---|---|
| ALS | baseline_als(y) |
Asymmetric least squares |
| SNIP | baseline_snip(y) |
Statistics-sensitive peak clipping |
| Polynomial | baseline_polynomial(y) |
Iterative polynomial fit |
| Rubberband | baseline_rubberband(y) |
Convex hull envelope |
Normalization
| Method | Function | Description |
|---|---|---|
| SNV | normalize_snv(y) |
Zero mean, unit variance |
| Min-Max | normalize_minmax(y) |
Scale to [0, 1] |
| Area | normalize_area(y) |
Unit area under curve |
| Vector | normalize_vector(y) |
L2 norm = 1 |
Derivatives
| Method | Function | Description |
|---|---|---|
| Savitzky-Golay | derivative_savgol(y) |
SG polynomial derivative |
| Gap-Segment | derivative_gap_segment(y) |
Norris-Williams derivative |
Scatter Correction
| Method | Function | Description |
|---|---|---|
| MSC | scatter_msc(y) |
Multiplicative scatter correction |
| EMSC | scatter_emsc(y) |
Extended MSC with polynomial terms |
Spectral Transforms
| Method | Function | Description |
|---|---|---|
| Kubelka-Munk | transform_kubelka_munk(y) |
Reflectance to K-M units |
| ATR Correction | transform_atr_correction(y, wn) |
ATR depth-of-penetration |
Operations
| Function | Description |
|---|---|
spectral_subtract(a, b) |
Spectral subtraction |
spectral_average(y) |
Mean spectrum from batch |
spectral_interpolate(y, wn, new_wn) |
Resample to new axis |
Peak Analysis
| Function | Description |
|---|---|
peaks_find(y) |
Find peaks with scipy.signal |
peaks_integrate(y) |
Integrate peak regions |
Similarity Metrics
| Metric | Function | Range |
|---|---|---|
| Cosine | similarity_cosine(a, b) |
[-1, 1] |
| Pearson | similarity_pearson(a, b) |
[-1, 1] |
| Spectral Angle | similarity_spectral_angle(a, b) |
[0, pi] |
| Euclidean | similarity_euclidean(a, b) |
[0, inf) |
I/O Formats
| Format | Function | Dependencies |
|---|---|---|
| JCAMP-DX | read_jcamp(path) |
None |
| SPC | read_spc(path) |
spc-spectra |
| CSV/TSV | read_csv(path) |
None |
| HDF5 | read_hdf5(path) / write_hdf5(spec, path) |
h5py |
| Bruker OPUS | read_opus(path) |
None |
Optional Backends
| Backend | Extra | Description |
|---|---|---|
| pybaselines | [baselines] |
200+ baseline methods via pybaselines_method() |
| lmfit | [fitting] |
Peak fitting with Gaussian, Lorentzian, Voigt models |
Visualization
from spectrakit.plot import plot_spectrum, plot_comparison, plot_baseline
Requires pip install pyspectrakit[plot].
Spectrum Container
from spectrakit import Spectrum
spec = Spectrum(
intensities=np.array([...]), # (W,) or (N, W)
wavenumbers=np.array([...]), # (W,), optional
metadata={"instrument": "Bruker"},
source_format="jcamp",
label="ethanol_ir",
)
CLI
pip install pyspectrakit[cli]
spectrakit info ethanol.dx
spectrakit convert ethanol.dx ethanol.h5
Examples
See the examples/ directory for Jupyter notebooks:
- Quick Start — basic preprocessing workflow
- Baseline Methods — comparing correction algorithms
- Derivatives & Peaks — derivative analysis and peak finding
- Scatter Correction — MSC vs EMSC vs SNV
- sklearn Pipeline — classification with preprocessing
Development
git clone https://github.com/ktubhyam/spectrakit.git
cd spectrakit
pip install -e ".[all,dev]"
pytest
See CONTRIBUTING.md for guidelines.
Citation
If you use SpectraKit in your research, please cite:
@software{spectrakit,
author = {Karthikeyan, Tubhyam},
title = {SpectraKit: Python toolkit for spectral data processing},
url = {https://github.com/ktubhyam/spectrakit},
license = {MIT}
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyspectrakit-1.7.2.tar.gz.
File metadata
- Download URL: pyspectrakit-1.7.2.tar.gz
- Upload date:
- Size: 61.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57b8aa88119c94976a6d550513c7b209d85cf3ef9383c64c9a3c189587de7c5e
|
|
| MD5 |
be414b32611be9f91c6b2cc283671be8
|
|
| BLAKE2b-256 |
7c3abe01fec7204533c67b9205e46bbeba5bdf47f328d81db2ada20d225db1c4
|
Provenance
The following attestation bundles were made for pyspectrakit-1.7.2.tar.gz:
Publisher:
publish.yml on ktubhyam/spectrakit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyspectrakit-1.7.2.tar.gz -
Subject digest:
57b8aa88119c94976a6d550513c7b209d85cf3ef9383c64c9a3c189587de7c5e - Sigstore transparency entry: 991831850
- Sigstore integration time:
-
Permalink:
ktubhyam/spectrakit@8c8118291abbcf0e862a72f493f64c14aa2e4e4b -
Branch / Tag:
refs/tags/v1.7.2 - Owner: https://github.com/ktubhyam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8c8118291abbcf0e862a72f493f64c14aa2e4e4b -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyspectrakit-1.7.2-py3-none-any.whl.
File metadata
- Download URL: pyspectrakit-1.7.2-py3-none-any.whl
- Upload date:
- Size: 70.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01442c1c2e61a440f8381e9facdbef68c995747597311e72002f65f1c58ea15e
|
|
| MD5 |
3fa77a06d818be89fc18f7b91603fbf1
|
|
| BLAKE2b-256 |
cd44fdb5f974b35e22f2280f4d9eaf6c8d6e1a54ee93a76615c472e7bb2eb166
|
Provenance
The following attestation bundles were made for pyspectrakit-1.7.2-py3-none-any.whl:
Publisher:
publish.yml on ktubhyam/spectrakit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyspectrakit-1.7.2-py3-none-any.whl -
Subject digest:
01442c1c2e61a440f8381e9facdbef68c995747597311e72002f65f1c58ea15e - Sigstore transparency entry: 991831855
- Sigstore integration time:
-
Permalink:
ktubhyam/spectrakit@8c8118291abbcf0e862a72f493f64c14aa2e4e4b -
Branch / Tag:
refs/tags/v1.7.2 - Owner: https://github.com/ktubhyam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8c8118291abbcf0e862a72f493f64c14aa2e4e4b -
Trigger Event:
push
-
Statement type: