A large-scale benchmark for machine learning on Raman spectroscopy data
Project description
RamanBench
A large-scale benchmark for machine learning on Raman spectroscopy data.
74 datasets · 163 prediction targets · 28 baseline models · 4 application domains
RamanBench provides a reproducible evaluation protocol and a curated collection of public Raman spectroscopy datasets spanning Material Science, Biological, Medical, and Chemical applications. Researchers can rank new models against 28 pre-evaluated baselines — from classical PLS to tabular foundation models and Raman-specific deep learning architectures — without re-running all experiments.
Ecosystem
raman-data ──▶ raman-bench ──▶ Live Leaderboard
(datasets) (this package) HuggingFace Space
PyPI / GitHub PyPI / GitHub
| Resource | Link |
|---|---|
| raman-data (dataset loader) | GitHub · PyPI |
| raman-bench (this package) | GitHub · PyPI |
| Live Leaderboard | huggingface.co/spaces/HTW-KI-Werkstatt/RamanBench |
| Paper | arXiv:2605.02003 |
Installation
Option 1 — Datasets + leaderboard (recommended starting point)
pip install raman-bench
This gives you:
- All 74 datasets with standardised train/test splits via
raman-data - Precomputed results for 28 baseline models (bundled CSVs, no internet needed)
- Leaderboard API — rank, plot, and compare against baselines
- Evaluation API —
lb.evaluate_and_add(model)works with any sklearn-compatible model
You can use any ML library you already have installed — scikit-learn, LightGBM, XGBoost, PyTorch, JAX, or anything else — against a large-scale, curated data foundation without installing a single additional dependency.
Option 2 — With all built-in models
Adds all Raman-specific architectures and standalone tabular foundation models,
all with a standard fit(X, y) / predict(X) interface:
pip install "raman-bench[models]"
This installs torch, tabpfn, pytabkit, tabdpt, sktime, and
ramanspy on top of the core package. No AutoGluon required.
Option 3 — Full benchmark reproducibility (AutoGluon fork)
The paper's benchmark runs all models through AutoGluon's automated preprocessing and HPO pipeline. The fork addresses two limitations of standard AutoGluon 1.5:
- Feature cap — AutoGluon caps tabular foundation models (TabPFN v2, TabICL, TabDPT, MITRA) at 500 features; Raman spectra typically have 500–4000 wavenumber points. The fork removes this cap.
- TabICL v2 regression — AutoGluon 1.5 ships TabICL v1, which supports classification only. The fork upgrades to TabICL v2, adding regression support. This limitation is expected to be resolved in AutoGluon 1.6.
A patched fork incorporates both fixes.
git clone https://github.com/ml-lab-htw/RamanBench.git
cd RamanBench
pip install -r requirements-autogluon-fork.txt
pip install -e ".[models]"
The fork is only needed to reproduce the exact paper benchmark. Options 1 and 2 work with a standard
pip installand give full access to all datasets, splits, and built-in models.
Quick Start
Load a dataset (Option 1 — core install only)
from raman_data import raman_data
ds = raman_data("amino_acids_glycine")
print(ds.spectra.shape) # (n_samples, n_wavenumbers)
print(ds.targets.shape) # (n_samples,)
print(ds.raman_shifts[:5]) # wavenumber axis in cm⁻¹
All 74 datasets are available this way. Each comes with a fixed train/test split so results are directly comparable to the precomputed baselines.
Evaluate your model against 28 baselines (Option 1)
Any scikit-learn–compatible estimator works:
from raman_bench import Leaderboard
from sklearn.cross_decomposition import PLSRegression
lb = Leaderboard.from_precomputed() # loads bundled v0.1 results
# Evaluates on all 74 datasets (3 seeds) and inserts into the ranking
results = lb.evaluate_and_add(
model_name="My-PLS-10",
model=PLSRegression(n_components=10),
)
print(lb.rank())
lb.plot()
Bring any library — LightGBM, XGBoost, a PyTorch model, a JAX model — and it will be scored on the same protocol as the 28 precomputed baselines.
Explore the precomputed leaderboard (Option 1)
from raman_bench import Leaderboard
lb = Leaderboard.from_precomputed()
print(lb.rank()) # ranked DataFrame
lb.plot() # horizontal bar chart
Use a built-in Raman model directly
All built-in models expose a standard sklearn fit / predict API:
import numpy as np
from raman_bench.models.custom import DeepCNNModel, TabPFNModel, RocketModel
X = np.random.randn(200, 512).astype("float32") # 200 spectra, 512 wavenumbers
y = np.random.randn(200) # regression targets
# Raman-specific deep learning model
model = DeepCNNModel(n_epochs=50)
model.fit(X, y)
predictions = model.predict(X)
# Tabular foundation model (no feature-count limit)
tfm = TabPFNModel()
tfm.fit(X, y)
predictions = tfm.predict(X)
Run the full benchmark pipeline (fork required)
# Pre-cache all dataset splits (optional, speeds up the run)
python scripts/prepare_datasets.py --config configs/benchmark_v0.1.json
# Run predictions → metrics
raman-bench run --config configs/benchmark_v0.1.json
# Run individual steps
raman-bench run --config configs/benchmark_v0.1.json --step predictions
raman-bench run --config configs/benchmark_v0.1.json --step metrics
Notebooks
| Notebook | Description |
|---|---|
01_quick_start.ipynb |
Load a dataset, explore the precomputed leaderboard, plot rankings |
02_benchmark_new_model.ipynb |
Evaluate your own model and add it to the leaderboard |
03_explore_results.ipynb |
Deep dive into per-dataset and per-domain results |
04_contribute_dataset.ipynb |
Step-by-step guide to contributing a new dataset |
Models
Paper baselines (28 models)
All results in the paper were produced through the AutoGluon pipeline (Option 3 install).
| Category | Models |
| Category | Models |
|---|---|
| Classical spectroscopy | PLS, KNN, LR |
| Tree ensembles | GBM (LightGBM), XGB, CatBoost, RF, XT |
| Tabular deep learning | NN_TORCH, FastAI, RealMLP |
| Tabular foundation models | TabPFN v2, TabPFN v2.5, TabM, TabDPT, TabICL, MITRA |
| Time-series classifiers | ROCKET, Arsenal |
| Raman-specific DL | DeepCNN, RamanNet, SANet, RamanFormer, RamanTransformer, ReZeroNet, FC-ResNeXt, CoAtNet |
| AutoGluon ensemble | AUTOGLUON |
Standalone sklearn wrappers (raman-bench[models])
raman-bench[models] provides sklearn-compatible (fit / predict) wrappers
for many of the same algorithm families, usable directly without AutoGluon or
the fork. These are not the exact pipeline configurations from the paper
(no AutoGluon preprocessing or HPO), but they use the same underlying
algorithms and are well-suited for building and evaluating new models.
| Class | Algorithm | Requires |
|---|---|---|
PLSModel |
Partial Least Squares | — |
DeepCNNModel |
Raman-specific CNN | torch |
RamanNetModel |
Raman-specific CNN | torch |
SANetModel |
Spectral attention net | torch |
RamanFormerModel |
Raman transformer | torch |
RamanTransformerModel |
Raman transformer | torch |
ReZeroNetModel |
ReZero CNN | torch |
FCResNeXtModel |
FC-ResNeXt | torch |
CoAtNetModel |
Conv + attention | torch |
RocketModel |
ROCKET classifier | sktime |
ArsenalModel |
Arsenal classifier | sktime |
TabPFNModel |
TabPFN v2 | tabpfn |
RealMLPModel |
RealMLP-TD | pytabkit |
TabMModel |
TabM-D | pytabkit |
TabDPTModel |
TabDPT | tabdpt |
All classes support classification and regression and auto-detect the task from
y. All package dependencies are included in raman-bench[models].
Benchmark Composition
Datasets
74 public Raman spectroscopy datasets from four application domains:
| Domain | Datasets | Task | Sources |
|---|---|---|---|
| Chemical | 37 | Regression | Zenodo, HuggingFace |
| Medical | 11 | Classification | Kaggle, Zenodo |
| Biological | 8 | Regression | HuggingFace, Zenodo |
| Material Science | 4 | Classification | RRUFF, Zenodo |
All datasets are accessible via pip install raman-data:
from raman_data import raman_data
dataset = raman_data("amino_acids_glycine")
X = dataset.spectra # (n_samples, n_wavenumbers)
y = dataset.targets # regression targets or class labels
w = dataset.raman_shifts # wavenumber axis in cm⁻¹
Dataset catalog: raman-data on GitHub
Ranking Protocol
Models are evaluated under three complementary metrics:
| Metric | Description |
|---|---|
| Elo | Pairwise win-rate Elo calibrated to RF = 1000 (200-round bootstrap) |
| Score | Normalised per-dataset score: best model = 1, median model = 0 |
| Avg Rank | Average rank across all datasets and targets |
| Improvability | % gap to the best model, averaged across datasets |
See the live leaderboard for interactive filtering by model category, task type, and dataset domain.
Repository Structure
RamanBench/
├── src/raman_bench/
│ ├── leaderboard.py # Leaderboard + model evaluation API
│ ├── benchmark.py # Dataset loading and cross-validation
│ ├── predictions.py # Prediction generation (benchmark step 1)
│ ├── evaluation.py # Metric computation (benchmark step 2)
│ ├── model.py # AutoGluon pipeline wrapper (fork required)
│ ├── config.py # JSON config loader
│ ├── models/custom/ # All built-in Raman models (sklearn API)
│ │ ├── base.py # BaseRamanEstimator (shared training loop)
│ │ ├── deepcnn.py # DeepCNNModel
│ │ ├── ramannet.py # RamanNetModel
│ │ ├── sanet.py # SANetModel
│ │ ├── ramanformer.py # RamanFormerModel
│ │ ├── ramantransformer.py # RamanTransformerModel
│ │ ├── rezeronet.py # ReZeroNetModel
│ │ ├── fcresnext.py # FCResNeXtModel
│ │ ├── coatnet.py # CoAtNetModel
│ │ ├── pls.py # PLSModel
│ │ ├── sktime_models.py # RocketModel, ArsenalModel
│ │ └── tabular_foundation.py # TabPFNModel, RealMLPModel, TabMModel, TabDPTModel
│ └── preprocessing/
│ ├── mixin.py # RamanPreprocessingMixin (AutoGluon HPO)
│ └── wrapped_models.py # Prep_* classes + SklearnAutoGluonBridge
├── configs/ # Benchmark configuration files
├── data/precomputed/ # Bundled v0.1 results
├── notebooks/ # Example Jupyter notebooks
├── scripts/ # CLI scripts
└── tests/ # pytest test suite
Architecture: two paths, one set of model classes
Custom models are implemented once as plain scikit-learn BaseEstimator
subclasses. The same classes are used in both usage modes:
Custom model (e.g. DeepCNNModel)
BaseEstimator — no AutoGluon dependency
fit(X, y) / predict(X)
│
├─── Standalone path (pip install "raman-bench[models]")
│ CUSTOM_MODELS["DEEPCNN"] → DeepCNNModel().fit(X, y)
│
└─── AutoGluon pipeline path (fork required)
SklearnAutoGluonBridge._fit() → DeepCNNModel(**params).fit(X_np, y_np)
Prep_DEEPCNN(_RamanDLBase, _DeepCNNBridge)
SklearnAutoGluonBridge (in preprocessing/wrapped_models.py) is the only
file that imports AutoGluon. All model source files are AutoGluon-free.
Contributing
We welcome contributions of new models and datasets!
Adding a New Model
The simplest way to add a model is to implement it as a scikit-learn–compatible estimator and submit a pull request. No AutoGluon knowledge is required.
- Create
src/raman_bench/models/custom/my_model.py:
import numpy as np
from sklearn.base import BaseEstimator
class MyModel(BaseEstimator):
def __init__(self, n_components=10, lr=1e-3):
self.n_components = n_components
self.lr = lr
def fit(self, X, y):
# X: np.ndarray (n_samples, n_features)
# y: np.ndarray — float → regression, int/str → classification
...
return self
def predict(self, X):
... # return np.ndarray (n_samples,)
def predict_proba(self, X):
... # classification only, return (n_samples, n_classes)
For PyTorch-based models, inherit from BaseRamanEstimator in
models/custom/base.py which provides a complete training loop with early
stopping, cosine LR schedule, mixed-class augmentation, and batched inference.
- Register in
src/raman_bench/models/custom/__init__.py:
from raman_bench.models.custom.my_model import MyModel
CUSTOM_MODELS["MYMODEL"] = MyModel
-
Add tests in
tests/models/test_my_model.pyfollowing the patterns intests/models/test_sanet.py. -
Open a pull request — CI will run the full test suite automatically.
See CONTRIBUTING.md for the full guide, including how to optionally wire your model into the AutoGluon benchmark pipeline for full reproducibility.
Adding a New Dataset
See CONTRIBUTING.md and NEW_DATASETS.md for detailed instructions and examples.
Quick summary:
- Upload your dataset to HuggingFace Datasets or Zenodo under CC BY 4.0.
- Add a loader to the raman-data package (open a PR there).
- Open an issue here linking to the raman-data PR.
The live leaderboard also has a "How to Contribute" section with step-by-step instructions.
Citation
If you use RamanBench in your research, please cite:
@misc{koddenbrock2026ramanbench,
title = {RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy},
author = {Koddenbrock, Mario and Lange, Christoph and Legner, Robin and Jaeger, Martin
and K{\"o}gler, Martin and Cruz Bournazou, Mariano N. and Neubauer, Peter
and Bie{\ss}mann, Felix and Rodner, Erik},
year = {2026},
eprint = {2605.02003},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2605.02003}
}
License
MIT — see LICENSE.
Dataset licenses vary; see the dataset catalog or raman-data for per-dataset license information. Most datasets are released under CC BY 4.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file raman_bench-0.1.0.tar.gz.
File metadata
- Download URL: raman_bench-0.1.0.tar.gz
- Upload date:
- Size: 920.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6155c8fa1c8b74355dde1eecfacea57cb27a78e2e0f4ade55f3873353e435b99
|
|
| MD5 |
1a24f8ed0fe1f26a639ec63b7c2da4a8
|
|
| BLAKE2b-256 |
a18a4827e416a5d8c9659bec2640264808db1d522227317ebd8a6cf82cbaa546
|
Provenance
The following attestation bundles were made for raman_bench-0.1.0.tar.gz:
Publisher:
ci.yml on ml-lab-htw/RamanBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
raman_bench-0.1.0.tar.gz -
Subject digest:
6155c8fa1c8b74355dde1eecfacea57cb27a78e2e0f4ade55f3873353e435b99 - Sigstore transparency entry: 1439425910
- Sigstore integration time:
-
Permalink:
ml-lab-htw/RamanBench@3884c015a5eece10bdc9e9d1584aadd79e05bf4f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ml-lab-htw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@3884c015a5eece10bdc9e9d1584aadd79e05bf4f -
Trigger Event:
push
-
Statement type:
File details
Details for the file raman_bench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: raman_bench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 937.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69eef4c3d3c363786b44a539c0c828c647e899126e74053dc0af8f8e66837c2a
|
|
| MD5 |
0f15e4c8a9381bbecb04995ed51e71ec
|
|
| BLAKE2b-256 |
423258f28eb12b68c3fdd274aab8a857b324a58a81c101848c8d871f5d891e68
|
Provenance
The following attestation bundles were made for raman_bench-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on ml-lab-htw/RamanBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
raman_bench-0.1.0-py3-none-any.whl -
Subject digest:
69eef4c3d3c363786b44a539c0c828c647e899126e74053dc0af8f8e66837c2a - Sigstore transparency entry: 1439425919
- Sigstore integration time:
-
Permalink:
ml-lab-htw/RamanBench@3884c015a5eece10bdc9e9d1584aadd79e05bf4f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ml-lab-htw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@3884c015a5eece10bdc9e9d1584aadd79e05bf4f -
Trigger Event:
push
-
Statement type: