Skip to main content

A large-scale benchmark for machine learning on Raman spectroscopy data

Project description

RamanBench

PyPI Python 3.11–3.13 CI License: MIT arXiv Leaderboard

A large-scale benchmark for machine learning on Raman spectroscopy data.

74 datasets · 163 prediction targets · 28 baseline models · 4 application domains

RamanBench provides a reproducible evaluation protocol and a curated collection of public Raman spectroscopy datasets spanning Material Science, Biological, Medical, and Chemical applications. Researchers can rank new models against 28 pre-evaluated baselines — from classical PLS to tabular foundation models and Raman-specific deep learning architectures — without re-running all experiments.


Ecosystem

raman-data   ──▶  raman-bench  ──▶  Live Leaderboard
(datasets)        (this package)     HuggingFace Space
PyPI / GitHub     PyPI / GitHub
Resource Link
raman-data (dataset loader) GitHub · PyPI
raman-bench (this package) GitHub · PyPI
Live Leaderboard huggingface.co/spaces/ml-lab-htw/RamanBench
Paper arXiv TBD

Quick Start

Installation

# Core package (leaderboard + dataset loading, no heavy dependencies)
pip install raman-bench

For running the full benchmark (AutoGluon + deep learning models), RamanBench requires a patched AutoGluon fork. The official AutoGluon release caps tabular foundation models (TabPFN v2, TabICL, TabDPT, MITRA, …) at 500 features and silently skips them on larger datasets; Raman spectra typically have 500–4000 wavenumber points. The fork removes this cap. Install it first:

git clone https://github.com/ml-lab-htw/RamanBench.git
cd RamanBench
pip install -r requirements-autogluon-fork.txt
pip install "raman-bench[deep]"

Explore the precomputed leaderboard

from raman_bench import Leaderboard

# Load v0.1 results: 28 models × 74 datasets
lb = Leaderboard.from_precomputed()
print(lb.rank())          # ranked DataFrame
lb.plot()                 # horizontal bar chart

Evaluate a new model

from raman_bench import Leaderboard
from sklearn.cross_decomposition import PLSRegression

lb = Leaderboard.from_precomputed()

# Evaluates your model on all 74 datasets (3 seeds) and adds it to the ranking
results = lb.evaluate_and_add(
    model_name="My-PLS-10",
    model=PLSRegression(n_components=10),
)
print(lb.rank())
lb.plot()

Run the full benchmark pipeline

# 1. Clone, install the AutoGluon fork, then install in development mode
git clone https://github.com/ml-lab-htw/RamanBench.git
cd RamanBench
pip install -r requirements-autogluon-fork.txt
pip install -e ".[deep]"

# 2. Pre-cache all dataset splits (optional, speeds up the run)
python scripts/prepare_datasets.py --config configs/benchmark_v0.1.json

# 3. Run predictions → metrics
raman-bench run --config configs/benchmark_v0.1.json

# 4. Run a single step
raman-bench run --config configs/benchmark_v0.1.json --step predictions
raman-bench run --config configs/benchmark_v0.1.json --step metrics

Notebooks

Notebook Description
01_quick_start.ipynb Load a dataset, explore the precomputed leaderboard, plot rankings
02_benchmark_new_model.ipynb Evaluate your own model and add it to the leaderboard
03_explore_results.ipynb Deep dive into per-dataset and per-domain results
04_contribute_dataset.ipynb Step-by-step guide to contributing a new dataset

Benchmark Composition

Datasets

74 public Raman spectroscopy datasets from four application domains:

Domain Datasets Task Sources
Chemical 37 Regression Zenodo, HuggingFace
Medical 11 Classification Kaggle, Zenodo
Biological 8 Regression HuggingFace, Zenodo
Material Science 4 Classification RRUFF, Zenodo

All datasets are accessible via pip install raman-data:

from raman_data import raman_data

dataset = raman_data("amino_acids_glycine")
X = dataset.spectra          # (n_samples, n_wavenumbers)
y = dataset.targets          # regression targets or class labels
w = dataset.raman_shifts     # wavenumber axis in cm⁻¹

Dataset catalog: raman-data on GitHub

Models (v0.1 — 28 baselines)

Classical ML / Spectroscopy

  • PLS (partial least squares)
  • KNN, LR, RF, XT, GBM (LightGBM), XGB (XGBoost), CatBoost

Tabular Deep Learning

  • NN_TORCH, FastAI, RealMLP

Tabular Foundation Models

  • TabPFN v2, TabPFN v2.5, MITRA, TabM, TabDPT, TabICL

Time-Series / Spectral Classifiers

  • ROCKET, ARSENAL

Raman-Specific Neural Networks

  • DeepCNN (Liu et al., 2017)
  • RamanNet (Ibtehaz et al., 2023)
  • SANet (Deng et al., 2021)
  • RamanFormer (Koyun et al., 2024)
  • RamanTransformer (Liu et al., 2023)
  • ReZeroNet, FC-ResNeXt, CoAtNet (Lange et al., 2025)

AutoGluon ensemble (AUTOGLUON)


Ranking Protocol

Models are evaluated under three complementary metrics:

Metric Description
Elo Pairwise win-rate Elo calibrated to RF = 1000 (200-round bootstrap)
Score Normalised per-dataset score: best model = 1, median model = 0
Avg Rank Average rank across all datasets and targets
Improvability % gap to the best model, averaged across datasets

See the live leaderboard for interactive filtering by model category, task type, and dataset domain.


Repository Structure

RamanBench/
├── src/raman_bench/       # Python package (install via pip)
│   ├── benchmark.py       # Dataset loading and caching
│   ├── model.py           # AutoGluon wrapper
│   ├── evaluation.py      # Metric computation (Step 2)
│   ├── predictions.py     # Prediction generation (Step 1)
│   ├── leaderboard.py     # Leaderboard + model evaluation
│   ├── config.py          # JSON config loader
│   ├── preprocessing/     # Raman preprocessing pipeline
│   ├── metrics/           # Classification + regression metrics
│   └── models/custom/     # 9 Raman-specific architectures
├── configs/               # Benchmark configuration files
│   ├── benchmark_v0.1.json
│   ├── models/            # Model lists (all, raman, traditional, foundation)
│   └── datasets/          # Dataset lists (regression_all, classification_all)
├── data/precomputed/      # Bundled v0.1 results (CSVs + dataset_stats.json)
├── notebooks/             # Example Jupyter notebooks
├── scripts/               # CLI scripts (run_benchmark.py, prepare_datasets.py)
├── tests/                 # pytest test suite
└── docs/                  # Sphinx documentation

Contributing

We welcome contributions of new models and datasets!

Adding a New Model

See CONTRIBUTING.md.

Quick summary:

  1. Implement your model as an AutoGluon AbstractModel subclass (or use the BaseCustomModel shared training loop).
  2. Register it in configs/models/.
  3. Add tests in tests/models/.

Adding a New Dataset

See CONTRIBUTING.md and NEW_DATASETS.md for detailed instructions and examples.

Quick summary:

  1. Upload your dataset to HuggingFace Datasets or Zenodo under CC BY 4.0.
  2. Add a loader to the raman-data package (open a PR there).
  3. Open an issue here linking to the raman-data PR.

The live leaderboard also has a "How to Contribute" section with step-by-step instructions.


Citation

If you use RamanBench in your research, please cite:

@inproceedings{koddenbrock2026ramanbench,
  title     = {RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy Data},
  author    = {Koddenbrock, Mario and Lange, Christoph and others},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2026},
  url       = {https://arxiv.org/abs/TBD}
}

License

MIT — see LICENSE.

Dataset licenses vary; see the dataset catalog or raman-data for per-dataset license information. Most datasets are released under CC BY 4.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raman_bench-0.1.0a1.tar.gz (88.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raman_bench-0.1.0a1-py3-none-any.whl (104.3 kB view details)

Uploaded Python 3

File details

Details for the file raman_bench-0.1.0a1.tar.gz.

File metadata

  • Download URL: raman_bench-0.1.0a1.tar.gz
  • Upload date:
  • Size: 88.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for raman_bench-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 9bcb5d7a46cfe1117a2fd1f28e11f23298a868f024b5e346c99170bbf2f353ca
MD5 eb2095621ced1e8f61137a4547be57c0
BLAKE2b-256 1384c4ceb2c346dc4cd6b90bb631e609201dd9d907a094cba8c953d8d2bda476

See more details on using hashes here.

Provenance

The following attestation bundles were made for raman_bench-0.1.0a1.tar.gz:

Publisher: ci.yml on ml-lab-htw/RamanBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file raman_bench-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: raman_bench-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 104.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for raman_bench-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 23555e3f3012227c770a8bfb89306e8f8949a260fe10f2e512440e8081eb8aec
MD5 fa6226cb6f79d410c1b7bc77d8449c1f
BLAKE2b-256 ae3492183c2bcb1fdd81fcae40b68d756e3922a3542dd29a5cfa7d684a809d57

See more details on using hashes here.

Provenance

The following attestation bundles were made for raman_bench-0.1.0a1-py3-none-any.whl:

Publisher: ci.yml on ml-lab-htw/RamanBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page