Gold-standard Raman and FTIR spectroscopy toolkit for food science.
Project description
foodspec
Headless, research-grade Python toolkit for Raman and FTIR spectroscopy in food science.
Built around clear data models, reproducible preprocessing, feature extraction, chemometrics, and domain workflows.
Why foodspec?
Modern food science uses Raman and FTIR spectra everywhere: oil authentication, adulteration detection, heating/oxidation studies, quality control, and even hyperspectral imaging.
But most analyses are locked in ad hoc scripts, inconsistent preprocessing, and irreproducible notebooks.
foodspec aims to fix that.
- A unified data model for spectra and hyperspectral cubes.
- Pipeline-style preprocessing (baseline, smoothing, scatter correction, normalization, FTIR/Raman helpers).
- Chemometrics + ML workflows for classification, regression, mixture analysis, and QC.
- Domain templates for edible oils, heating degradation, and more.
- CLI + config + logging + reporting for reproducible runs.
- Aligns naturally with a MethodsX-style protocol and FAIR principles.
Key Features
Core Data Structures
FoodSpectrumSet- Batches of 1D spectra (
x: (n_samples, n_wavenumbers)), - Shared
wavenumbers, metadataas apandas.DataFrame(oil type, mixture fraction, etc.),modalitytag ("raman"/"ftir"/ others).
- Batches of 1D spectra (
HyperSpectralCube- 3D cubes:
(height, width, n_wavenumbers), - Conversions to/from
FoodSpectrumSetwith pixel-wise metadata, - Helpers for ratio maps and clustering visualizations.
- 3D cubes:
IO & Reproducibility
- HDF5 library helpers:
create_library,load_libraryfor large spectral libraries. - Example and public dataset loaders:
load_example_oils()– synthetic, self-contained example dataset.load_public_mendeley_oils(...)– hook for a public edible-oil dataset.load_public_evoo_sunflower_raman(...)– EVOO–sunflower mixture dataset.load_public_ftir_oils(...)– FTIR edible-oil dataset hook.
- Tidy CSV/HDF5 exporters, metadata round-tripping, and validation utilities.
Preprocessing (sklearn-style, pipeline-friendly)
- Baseline:
- ALS (Eilers), rubberband, polynomial baseline.
- Smoothing:
- Savitzky–Golay, moving average.
- Normalization / scatter correction:
- Vector (L1/L2/max), area, internal-peak normalization.
- SNVNormalizer (Standard Normal Variate).
- MSCNormalizer (Multiplicative Scatter Correction).
- Derivatives:
- Savitzky–Golay derivatives (1st/2nd).
- Cropping:
RangeCropperand FTIR-specific helpers.
- Raman/FTIR specifics:
CosmicRayRemover,AtmosphericCorrector,SimpleATRCorrector.
Feature Extraction
- Peaks:
PeakFeatureExtractor(heights, areas with tolerance windows).
- Bands:
integrate_bandsfor integrated regions.
- Ratios:
RatioFeatureGenerator/compute_ratiosfor chemically meaningful ratios (e.g. 1652/1742, 3010/2850).
- Fingerprinting:
- Cosine/correlation similarity helpers.
Chemometrics & Machine Learning
- PCA:
run_pcagiving scores, loadings, explained variance.
- Models:
- Classifier/regressor factory: Random Forest, SVM variants, Logistic Regression, PLS / PLS-DA, kNN, etc.
- Validation:
- Cross-validation helpers, classification/regression metrics, permutation tests.
- Mixture analysis:
nnls_mixture(non-negative least squares).mcr_alsfor mixture decomposition.
- Optional deep learning:
Conv1DSpectrumClassifier(1D CNN) with sklearn-like API.- Guarded imports: raises a clear error if TensorFlow is not installed.
Applications (Turnkey Workflows)
- Oil authentication
run_oil_authentication_workflow
→ preprocess → peaks/ratios → classifier → metrics, confusion matrix, feature importances. - Heating degradation
run_heating_degradation_analysis
→ ratios vs time → trend models → ANOVA (optional). - Quality control / novelty detection
train_qc_model,apply_qc_model
→ one-class SVM / IsolationForest on reference spectra. - Domain templates
- Dairy, meat, microbial workflows that re-use the oil-style template.
Visualization
- Spectra:
- Overlays, mean spectra, before/after preprocessing plots.
- PCA:
- Score and loading plots.
- Classification:
- Confusion matrices.
- Heating:
- Ratio/time trends with grouped comparisons.
- Hyperspectral:
- Ratio maps and cluster maps from
HyperSpectralCube.
- Ratio maps and cluster maps from
Validation, Logging, Reporting
validationmodule:validate_spectrum_set,validate_public_evoo_sunflower(shape, monotonic axes, NaNs, mixture ranges).
- Logging:
- Lightweight logger + run metadata (versions, platform, timestamp).
- Reporting:
- Standardized run directories with:
metrics.jsonrun_metadata.json- plots (PNG)
report.md
- Standardized run directories with:
- Config:
- YAML/JSON configs for CLI runs via
--config.
- YAML/JSON configs for CLI runs via
Installation
Requires Python 3.10+.
# core installation
pip install foodspec
# with "deep" extras (1D CNN requires TensorFlow)
pip install 'foodspec[deep]'
Note: TensorFlow is not installed by default. Calling Conv1DSpectrumClassifier without it will raise a clear ImportError suggesting pip install 'foodspec[deep]'.
Quickstart (Python API)
from pathlib import Path
from foodspec.core.dataset import FoodSpectrumSet
from foodspec.preprocess.baseline import ALSBaseline
from foodspec.preprocess.smoothing import SavitzkyGolaySmoother
from foodspec.preprocess.normalization import VectorNormalizer
from foodspec.features.ratios import RatioFeatureGenerator
from foodspec.chemometrics.models import make_classifier
from foodspec.validation import validate_spectrum_set
# 1. Load spectra (example or public loader)
fs = FoodSpectrumSet(
x=..., # shape (n_samples, n_wavenumbers)
wavenumbers=...,
metadata=...,
modality="raman",
)
validate_spectrum_set(fs)
# 2. Build a preprocessing pipeline (illustrative only)
# (In practice, use sklearn.pipeline.Pipeline)
als = ALSBaseline(lambda_=1e5, p=0.001)
savgol = SavitzkyGolaySmoother(window_length=11, polyorder=3)
norm = VectorNormalizer(norm="l2")
X_proc = norm.transform(savgol.transform(als.transform(fs.x)))
# 3. Extract peak ratios (example)
ratio_gen = RatioFeatureGenerator(
ratio_def={"ratio_1652_1742": ("peak_1652_height", "peak_1742_height")}
)
ratio_table = ratio_gen.transform(...)
# 4. Prepare X, y and train a classifier
fs.metadata["oil_type"] = ... # ensure target column exists
train_set, test_set = fs.train_test_split(target_col="oil_type", test_size=0.3)
X_train, y_train = train_set.to_X_y("oil_type")
X_test, y_test = test_set.to_X_y("oil_type")
clf = make_classifier("rf", random_state=42)
clf.fit(X_train, y_train)
print("Test accuracy:", clf.score(X_test, y_test))
CLI Usage
After installation, you get a foodspec command with multiple subcommands.
Check installation
foodspec about
This prints:
- foodspec version
- Python version
- optional extras status (deep learning)
- documentation URL
Oil authentication workflow
Using a config file:
foodspec oil-auth --config examples/configs/oil_auth_public.yml
Loads your spectral library and/or public datasets.
Runs preprocessing → features → classifier → validation.
Creates a timestamped run directory with metrics, plots, and a Markdown report.
Protocol benchmarks (public datasets)
foodspec protocol-benchmarks --output-dir runs/protocol_benchmarks
Uses public dataset loaders (if available).
Runs a classification benchmark and a mixture analysis.
Saves metrics + run metadata + a summary report.
MethodsX protocol reproduction
foodspec reproduce-methodsx --output-dir runs/methodsx_protocol
Reproduces the core analyses described in the MethodsX protocol article (classification + mixture analysis + PCA).
Uses public datasets (or synthetic stand-ins where documented).
Produces a complete run directory with metrics, run metadata, plots, and a Markdown report.
For full details and all commands (heating, QC, domains, hyperspectral, mixture, model registry), see the documentation.
Public Datasets
The package includes loaders for several public edible-oil datasets, but it does not bundle the data.
Typical workflow:
Follow instructions in the docs (docs/libraries.md) to:
- Download the Mendeley edible-oil dataset (Raman/FTIR).
- Download the EVOO–sunflower mixture dataset.
- Download an FTIR edible-oil dataset.
Place them in the documented folder structure. Then:
from foodspec.data import load_public_mendeley_oils, load_public_evoo_sunflower_raman
fs_mend = load_public_mendeley_oils(root="path/to/mendeley")
fs_mix = load_public_evoo_sunflower_raman(root="path/to/evoo_sunflower")
Each loader:
- Returns a validated
FoodSpectrumSet. - Adds metadata such as oil_type, mixture_fraction_evoo, dataset_name, doi, etc.
Documentation
Full documentation (Getting started, API, CLI, workflows, MethodsX protocol, deep learning, citing) is provided via MkDocs.
Docs URL:
https://chandrasekarnarayana.github.io/foodspec/
Key pages:
- Getting started – installation, basic examples, data loading.
- Libraries – building and loading spectral libraries (HDF5, CSV, public datasets).
- Validation & Chemometrics – PCA, ML, oil-authentication examples.
- MethodsX protocol – mapping between foodspec workflows and the MethodsX protocol article.
- Advanced: Deep Learning – optional 1D CNN model usage.
- Citing foodspec – how to cite the package and the protocol paper.
Citing foodspec
If you use foodspec in your research, please cite:
- The software (this package), via the
CITATION.cfffile included in the repository. - The MethodsX protocol article (once published), which formally describes the workflow.
Until the article is out, you can use a provisional citation:
Chandrasekar Subramani Narayan, foodspec: A Python toolkit for Raman and FTIR spectroscopy in food science, software version 0.2.0, 2025.
See CITATION.cff for machine-readable citation metadata.
Contributing
Contributions (bug reports, feature requests, docs improvements) are welcome.
Please open an issue on GitHub to discuss major changes.
When sending PRs:
- Run
pytest(all tests should pass). - Keep coverage ≥ 80% where possible.
- Update docs and type hints for new public APIs.
License
foodspec is released under the MIT License.
See the LICENSE file for full details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file foodspec-0.2.0.tar.gz.
File metadata
- Download URL: foodspec-0.2.0.tar.gz
- Upload date:
- Size: 106.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26600668c98cbafa621a341da697967bd1d65df14bd0e0b8d9ce77d6f10c2a8f
|
|
| MD5 |
4de97cf906d72aa7ef4475013cefce28
|
|
| BLAKE2b-256 |
390823268d49894d5df62f5d47df4195fa1f522af037ebcb8ada2a1041f7974c
|
File details
Details for the file foodspec-0.2.0-py3-none-any.whl.
File metadata
- Download URL: foodspec-0.2.0-py3-none-any.whl
- Upload date:
- Size: 69.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f954b73cef7f6b7bef45482eae4b7c711f2f15958c4b66d19f41d434b64c914
|
|
| MD5 |
448cd64ed52da0c19efe7a1ad230b24c
|
|
| BLAKE2b-256 |
5040a81ec7ed83b211e667dc419f61741782b4daebe781d844de77cfc68f911a
|