Skip to main content

Python toolkit for spectral data processing: format parsers, baseline correction, normalization, and similarity matching.

Project description

SpectraKit

Python toolkit for spectral data processing: smoothing, baseline correction, normalization, scatter correction, derivatives, peak analysis, and more.

CI codecov PyPI Python 3.10+ License: MIT Typed

SpectraKit is a lightweight, pip-installable library for preprocessing and analyzing spectral data from IR, Raman, and NIR spectroscopy. It follows a functional design with NumPy arrays as the primary data type and requires only NumPy + SciPy as core dependencies.

Documentation | API Reference | Examples

Installation

pip install pyspectrakit

Note: The PyPI distribution name is pyspectrakit (due to a naming conflict). The import name is simply import spectrakit.

Optional extras for additional functionality:

pip install pyspectrakit[io]         # HDF5 file support
pip install pyspectrakit[cli]        # Command-line interface
pip install pyspectrakit[baselines]  # pybaselines backend (200+ methods)
pip install pyspectrakit[fitting]    # lmfit peak fitting
pip install pyspectrakit[sklearn]    # scikit-learn integration
pip install pyspectrakit[plot]       # Plotting utilities
pip install pyspectrakit[all]        # Everything above

Quick Start

import numpy as np
from spectrakit import smooth_savgol, baseline_als, normalize_snv

# Load your spectral data (N spectra, W wavelengths)
spectra = np.loadtxt("data.csv", delimiter=",")

# Process with individual functions
smoothed = smooth_savgol(spectra, window_length=11)
corrected = baseline_als(smoothed, lam=1e6, p=0.01)
normalized = normalize_snv(corrected)

All functions accept both single spectra (W,) and batches (N, W).

Pipeline

Chain steps for reproducibility:

from spectrakit.pipeline import Pipeline

pipe = Pipeline()
pipe.add("smooth", smooth_savgol, window_length=11)
pipe.add("baseline", baseline_als, lam=1e6)
pipe.add("normalize", normalize_snv)

processed = pipe.transform(spectra)

scikit-learn Integration

Use any SpectraKit function in an sklearn pipeline:

from sklearn.pipeline import Pipeline as SkPipeline
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from spectrakit.sklearn import SpectralTransformer

pipe = SkPipeline([
    ("smooth", SpectralTransformer(smooth_savgol, window_length=11)),
    ("baseline", SpectralTransformer(baseline_als, lam=1e6)),
    ("normalize", SpectralTransformer(normalize_snv)),
    ("pca", PCA(n_components=10)),
    ("svm", SVC()),
])

pipe.fit(X_train, y_train)
predictions = pipe.predict(X_test)

Features

Smoothing

Method Function Description
Savitzky-Golay smooth_savgol(y) Polynomial least-squares smoothing
Whittaker smooth_whittaker(y) Penalized least-squares smoother

Baseline Correction

Method Function Description
ALS baseline_als(y) Asymmetric least squares
SNIP baseline_snip(y) Statistics-sensitive peak clipping
Polynomial baseline_polynomial(y) Iterative polynomial fit
Rubberband baseline_rubberband(y) Convex hull envelope

Normalization

Method Function Description
SNV normalize_snv(y) Zero mean, unit variance
Min-Max normalize_minmax(y) Scale to [0, 1]
Area normalize_area(y) Unit area under curve
Vector normalize_vector(y) L2 norm = 1

Derivatives

Method Function Description
Savitzky-Golay derivative_savgol(y) SG polynomial derivative
Gap-Segment derivative_gap_segment(y) Norris-Williams derivative

Scatter Correction

Method Function Description
MSC scatter_msc(y) Multiplicative scatter correction
EMSC scatter_emsc(y) Extended MSC with polynomial terms

Spectral Transforms

Method Function Description
Kubelka-Munk transform_kubelka_munk(y) Reflectance to K-M units
ATR Correction transform_atr_correction(y, wn) ATR depth-of-penetration

Operations

Function Description
spectral_subtract(a, b) Spectral subtraction
spectral_average(y) Mean spectrum from batch
spectral_interpolate(y, wn, new_wn) Resample to new axis

Peak Analysis

Function Description
peaks_find(y) Find peaks with scipy.signal
peaks_integrate(y) Integrate peak regions

Similarity Metrics

Metric Function Range
Cosine similarity_cosine(a, b) [-1, 1]
Pearson similarity_pearson(a, b) [-1, 1]
Spectral Angle similarity_spectral_angle(a, b) [0, pi]
Euclidean similarity_euclidean(a, b) [0, inf)

I/O Formats

Format Function Dependencies
JCAMP-DX read_jcamp(path) None
SPC read_spc(path) spc-spectra
CSV/TSV read_csv(path) None
HDF5 read_hdf5(path) / write_hdf5(spec, path) h5py
Bruker OPUS read_opus(path) None

Optional Backends

Backend Extra Description
pybaselines [baselines] 200+ baseline methods via pybaselines_method()
lmfit [fitting] Peak fitting with Gaussian, Lorentzian, Voigt models

Visualization

from spectrakit.plot import plot_spectrum, plot_comparison, plot_baseline

Requires pip install pyspectrakit[plot].

Spectrum Container

from spectrakit import Spectrum

spec = Spectrum(
    intensities=np.array([...]),       # (W,) or (N, W)
    wavenumbers=np.array([...]),       # (W,), optional
    metadata={"instrument": "Bruker"},
    source_format="jcamp",
    label="ethanol_ir",
)

CLI

pip install pyspectrakit[cli]

spectrakit info ethanol.dx
spectrakit convert ethanol.dx ethanol.h5

Examples

See the examples/ directory for Jupyter notebooks:

  1. Quick Start — basic preprocessing workflow
  2. Baseline Methods — comparing correction algorithms
  3. Derivatives & Peaks — derivative analysis and peak finding
  4. Scatter Correction — MSC vs EMSC vs SNV
  5. sklearn Pipeline — classification with preprocessing

Development

git clone https://github.com/ktubhyam/spectrakit.git
cd spectrakit
pip install -e ".[all,dev]"
pytest

See CONTRIBUTING.md for guidelines.

Citation

If you use SpectraKit in your research, please cite:

@software{spectrakit,
  author = {Karthikeyan, Tubhyam},
  title = {SpectraKit: Python toolkit for spectral data processing},
  url = {https://github.com/ktubhyam/spectrakit},
  license = {MIT}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspectrakit-1.7.2.tar.gz (61.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyspectrakit-1.7.2-py3-none-any.whl (70.9 kB view details)

Uploaded Python 3

File details

Details for the file pyspectrakit-1.7.2.tar.gz.

File metadata

  • Download URL: pyspectrakit-1.7.2.tar.gz
  • Upload date:
  • Size: 61.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyspectrakit-1.7.2.tar.gz
Algorithm Hash digest
SHA256 57b8aa88119c94976a6d550513c7b209d85cf3ef9383c64c9a3c189587de7c5e
MD5 be414b32611be9f91c6b2cc283671be8
BLAKE2b-256 7c3abe01fec7204533c67b9205e46bbeba5bdf47f328d81db2ada20d225db1c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyspectrakit-1.7.2.tar.gz:

Publisher: publish.yml on ktubhyam/spectrakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyspectrakit-1.7.2-py3-none-any.whl.

File metadata

  • Download URL: pyspectrakit-1.7.2-py3-none-any.whl
  • Upload date:
  • Size: 70.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyspectrakit-1.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01442c1c2e61a440f8381e9facdbef68c995747597311e72002f65f1c58ea15e
MD5 3fa77a06d818be89fc18f7b91603fbf1
BLAKE2b-256 cd44fdb5f974b35e22f2280f4d9eaf6c8d6e1a54ee93a76615c472e7bb2eb166

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyspectrakit-1.7.2-py3-none-any.whl:

Publisher: publish.yml on ktubhyam/spectrakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page