Gold-standard Raman and FTIR spectroscopy toolkit for food science.

These details have not been verified by PyPI

Project links

Project description

foodspec

Headless, research-grade Python toolkit for Raman and FTIR spectroscopy in food science.
Built around clear data models, reproducible preprocessing, feature extraction, chemometrics, and domain workflows.

Why foodspec?

Modern food science uses Raman and FTIR spectra everywhere: oil authentication, adulteration detection, heating/oxidation studies, quality control, and even hyperspectral imaging.
But most analyses are locked in ad hoc scripts, inconsistent preprocessing, and irreproducible notebooks.

foodspec aims to fix that.

A unified data model for spectra and hyperspectral cubes.
Pipeline-style preprocessing (baseline, smoothing, scatter correction, normalization, FTIR/Raman helpers).
Chemometrics + ML workflows for classification, regression, mixture analysis, and QC.
Domain templates for edible oils, heating degradation, and more.
CLI + config + logging + reporting for reproducible runs.
Aligns naturally with a MethodsX-style protocol and FAIR principles.

Key Features

Core Data Structures

FoodSpectrumSet
- Batches of 1D spectra (x: (n_samples, n_wavenumbers)),
- Shared wavenumbers,
- metadata as a pandas.DataFrame (oil type, mixture fraction, etc.),
- modality tag ("raman" / "ftir" / others).
HyperSpectralCube
- 3D cubes: (height, width, n_wavenumbers),
- Conversions to/from FoodSpectrumSet with pixel-wise metadata,
- Helpers for ratio maps and clustering visualizations.

IO & Reproducibility

HDF5 library helpers: create_library, load_library for large spectral libraries.
Example and public dataset loaders:
- load_example_oils() – synthetic, self-contained example dataset.
- load_public_mendeley_oils(...) – hook for a public edible-oil dataset.
- load_public_evoo_sunflower_raman(...) – EVOO–sunflower mixture dataset.
- load_public_ftir_oils(...) – FTIR edible-oil dataset hook.
Tidy CSV/HDF5 exporters, metadata round-tripping, and validation utilities.

Preprocessing (sklearn-style, pipeline-friendly)

Baseline:
- ALS (Eilers), rubberband, polynomial baseline.
Smoothing:
- Savitzky–Golay, moving average.
Normalization / scatter correction:
- Vector (L1/L2/max), area, internal-peak normalization.
- SNVNormalizer (Standard Normal Variate).
- MSCNormalizer (Multiplicative Scatter Correction).
Derivatives:
- Savitzky–Golay derivatives (1st/2nd).
Cropping:
- RangeCropper and FTIR-specific helpers.
Raman/FTIR specifics:
- CosmicRayRemover, AtmosphericCorrector, SimpleATRCorrector.

Feature Extraction

Peaks:
- PeakFeatureExtractor (heights, areas with tolerance windows).
Bands:
- integrate_bands for integrated regions.
Ratios:
- RatioFeatureGenerator / compute_ratios for chemically meaningful ratios (e.g. 1652/1742, 3010/2850).
Fingerprinting:
- Cosine/correlation similarity helpers.

Chemometrics & Machine Learning

PCA:
- run_pca giving scores, loadings, explained variance.
Models:
- Classifier/regressor factory: Random Forest, SVM variants, Logistic Regression, PLS / PLS-DA, kNN, etc.
Validation:
- Cross-validation helpers, classification/regression metrics, permutation tests.
Mixture analysis:
- nnls_mixture (non-negative least squares).
- mcr_als for mixture decomposition.
Optional deep learning:
- Conv1DSpectrumClassifier (1D CNN) with sklearn-like API.
- Guarded imports: raises a clear error if TensorFlow is not installed.

Applications (Turnkey Workflows)

Oil authentication
run_oil_authentication_workflow
→ preprocess → peaks/ratios → classifier → metrics, confusion matrix, feature importances.
Heating degradation
run_heating_degradation_analysis
→ ratios vs time → trend models → ANOVA (optional).
Quality control / novelty detection
train_qc_model, apply_qc_model
→ one-class SVM / IsolationForest on reference spectra.
Domain templates
- Dairy, meat, microbial workflows that re-use the oil-style template.

Visualization

Spectra:
- Overlays, mean spectra, before/after preprocessing plots.
PCA:
- Score and loading plots.
Classification:
- Confusion matrices.
Heating:
- Ratio/time trends with grouped comparisons.
Hyperspectral:
- Ratio maps and cluster maps from HyperSpectralCube.

Validation, Logging, Reporting

validation module:
- validate_spectrum_set, validate_public_evoo_sunflower (shape, monotonic axes, NaNs, mixture ranges).
Logging:
- Lightweight logger + run metadata (versions, platform, timestamp).
Reporting:
- Standardized run directories with:
  - metrics.json
  - run_metadata.json
  - plots (PNG)
  - report.md
Config:
- YAML/JSON configs for CLI runs via --config.

Installation

Requires Python 3.10+.

# core installation
pip install foodspec

# with "deep" extras (1D CNN requires TensorFlow)
pip install 'foodspec[deep]'

Note: TensorFlow is not installed by default. Calling Conv1DSpectrumClassifier without it will raise a clear ImportError suggesting pip install 'foodspec[deep]'.

Quickstart (Python API)

from pathlib import Path

from foodspec.core.dataset import FoodSpectrumSet
from foodspec.preprocess.baseline import ALSBaseline
from foodspec.preprocess.smoothing import SavitzkyGolaySmoother
from foodspec.preprocess.normalization import VectorNormalizer
from foodspec.features.ratios import RatioFeatureGenerator
from foodspec.chemometrics.models import make_classifier
from foodspec.validation import validate_spectrum_set

# 1. Load spectra (example or public loader)
fs = FoodSpectrumSet(
    x=...,  # shape (n_samples, n_wavenumbers)
    wavenumbers=...,
    metadata=...,
    modality="raman",
)
validate_spectrum_set(fs)

# 2. Build a preprocessing pipeline (illustrative only)
# (In practice, use sklearn.pipeline.Pipeline)
als = ALSBaseline(lambda_=1e5, p=0.001)
savgol = SavitzkyGolaySmoother(window_length=11, polyorder=3)
norm = VectorNormalizer(norm="l2")
X_proc = norm.transform(savgol.transform(als.transform(fs.x)))

# 3. Extract peak ratios (example)
ratio_gen = RatioFeatureGenerator(
    ratio_def={"ratio_1652_1742": ("peak_1652_height", "peak_1742_height")}
)
ratio_table = ratio_gen.transform(...)

# 4. Prepare X, y and train a classifier
fs.metadata["oil_type"] = ...  # ensure target column exists
train_set, test_set = fs.train_test_split(target_col="oil_type", test_size=0.3)
X_train, y_train = train_set.to_X_y("oil_type")
X_test, y_test = test_set.to_X_y("oil_type")

clf = make_classifier("rf", random_state=42)
clf.fit(X_train, y_train)
print("Test accuracy:", clf.score(X_test, y_test))

CLI Usage

After installation, you get a foodspec command with multiple subcommands.

Check installation

foodspec about

This prints:

foodspec version
Python version
optional extras status (deep learning)
documentation URL

Oil authentication workflow

Using a config file:

foodspec oil-auth --config examples/configs/oil_auth_public.yml

Loads your spectral library and/or public datasets.
Runs preprocessing → features → classifier → validation.
Creates a timestamped run directory with metrics, plots, and a Markdown report.

Protocol benchmarks (public datasets)

foodspec protocol-benchmarks --output-dir runs/protocol_benchmarks

Uses public dataset loaders (if available).
Runs a classification benchmark and a mixture analysis.
Saves metrics + run metadata + a summary report.

MethodsX protocol reproduction

foodspec reproduce-methodsx --output-dir runs/methodsx_protocol

Reproduces the core analyses described in the MethodsX protocol article (classification + mixture analysis + PCA).
Uses public datasets (or synthetic stand-ins where documented).
Produces a complete run directory with metrics, run metadata, plots, and a Markdown report.

For full details and all commands (heating, QC, domains, hyperspectral, mixture, model registry), see the documentation.

Public Datasets

The package includes loaders for several public edible-oil datasets, but it does not bundle the data.

Typical workflow:

Follow instructions in the docs (docs/libraries.md) to:

Download the Mendeley edible-oil dataset (Raman/FTIR).
Download the EVOO–sunflower mixture dataset.
Download an FTIR edible-oil dataset.

Place them in the documented folder structure. Then:

from foodspec.data import load_public_mendeley_oils, load_public_evoo_sunflower_raman

fs_mend = load_public_mendeley_oils(root="path/to/mendeley")
fs_mix = load_public_evoo_sunflower_raman(root="path/to/evoo_sunflower")

Each loader:

Returns a validated FoodSpectrumSet.
Adds metadata such as oil_type, mixture_fraction_evoo, dataset_name, doi, etc.

Documentation

Full documentation (Getting started, API, CLI, workflows, MethodsX protocol, deep learning, citing) is provided via MkDocs.

Docs URL:
https://chandrasekarnarayana.github.io/foodspec/

Key pages:

Getting started – installation, basic examples, data loading.
Libraries – building and loading spectral libraries (HDF5, CSV, public datasets).
Validation & Chemometrics – PCA, ML, oil-authentication examples.
MethodsX protocol – mapping between foodspec workflows and the MethodsX protocol article.
Advanced: Deep Learning – optional 1D CNN model usage.
Citing foodspec – how to cite the package and the protocol paper.

Citing foodspec

If you use foodspec in your research, please cite:

The software (this package), via the CITATION.cff file included in the repository.
The MethodsX protocol article (once published), which formally describes the workflow.

Until the article is out, you can use a provisional citation:

Chandrasekar Subramani Narayan, foodspec: A Python toolkit for Raman and FTIR spectroscopy in food science, software version 0.2.0, 2025.

See CITATION.cff for machine-readable citation metadata.

Contributing

Contributions (bug reports, feature requests, docs improvements) are welcome.
Please open an issue on GitHub to discuss major changes.

When sending PRs:

Run pytest (all tests should pass).
Keep coverage ≥ 80% where possible.
Update docs and type hints for new public APIs.

License

foodspec is released under the MIT License.
See the LICENSE file for full details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Dec 25, 2025

This version

0.2.0

Dec 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foodspec-0.2.0.tar.gz (106.4 kB view details)

Uploaded Dec 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

foodspec-0.2.0-py3-none-any.whl (69.8 kB view details)

Uploaded Dec 1, 2025 Python 3

File details

Details for the file foodspec-0.2.0.tar.gz.

File metadata

Download URL: foodspec-0.2.0.tar.gz
Upload date: Dec 1, 2025
Size: 106.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for foodspec-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`26600668c98cbafa621a341da697967bd1d65df14bd0e0b8d9ce77d6f10c2a8f`
MD5	`4de97cf906d72aa7ef4475013cefce28`
BLAKE2b-256	`390823268d49894d5df62f5d47df4195fa1f522af037ebcb8ada2a1041f7974c`

See more details on using hashes here.

File details

Details for the file foodspec-0.2.0-py3-none-any.whl.

File metadata

Download URL: foodspec-0.2.0-py3-none-any.whl
Upload date: Dec 1, 2025
Size: 69.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for foodspec-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f954b73cef7f6b7bef45482eae4b7c711f2f15958c4b66d19f41d434b64c914`
MD5	`448cd64ed52da0c19efe7a1ad230b24c`
BLAKE2b-256	`5040a81ec7ed83b211e667dc419f61741782b4daebe781d844de77cfc68f911a`

See more details on using hashes here.

foodspec 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

foodspec

Why foodspec?

Key Features

Core Data Structures

IO & Reproducibility

Preprocessing (sklearn-style, pipeline-friendly)

Feature Extraction

Chemometrics & Machine Learning

Applications (Turnkey Workflows)

Visualization

Validation, Logging, Reporting

Installation

Quickstart (Python API)

CLI Usage

Check installation

Oil authentication workflow

Protocol benchmarks (public datasets)

MethodsX protocol reproduction

Public Datasets

Documentation

Citing foodspec

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes