CatBench Framework for Benchmarking Machine Learning Interatomic Potentials in Adsorption Energy Predictions for Heterogeneous Catalysis
Project description
CatBench
A benchmarking framework for Machine Learning Interatomic Potentials in heterogeneous catalysis.
CatBench evaluates MLIPs against DFT references across four task types — adsorption energy, surface energy, bulk formation energy, and equation of state — with automated data processing, reproducible calculation workflows, and statistical anomaly detection.
Table of Contents
- Installation
- Quick Start
- Tutorials
- Adsorption Energy Benchmarking
- Relative Energy Benchmarking
- Equation of State (EOS) Benchmarking
- Configuration Reference
- Citation
Installation
# Basic
pip install catbench
# With D3 dispersion correction (GPU required; CPU-only not currently supported)
pip install catbench[d3]
# Development install from source
git clone https://github.com/JinukMoon/CatBench.git
cd CatBench
pip install -e .
Quick Start
Minimum viable benchmark in 5 lines:
from catbench.adsorption import zenodo_download, AdsorptionCalculation, AdsorptionAnalysis
from your_mlip import YourCalculator
zenodo_download("BM_dataset") # 445 KB
calc = YourCalculator(...)
AdsorptionCalculation([calc] * 3, mlip_name="MyMLIP", benchmark="BM_dataset").run()
AdsorptionAnalysis().analysis() # parity plots + Excel
For an end-to-end walkthrough on the publication's main benchmark (MamunHighT2019) with MACE-MP-0, see the tutorial notebook.
Tutorials
tutorials/catbench_tutorial.ipynb — complete walkthrough on a 20-reaction subset of MamunHighT2019 using MACE-MP-0 as the representative MLIP. Covers Zenodo download, calculator setup, calculation, analysis, and the multi-MLIP comparison workflow. Runs in about 10 minutes on a Colab T4 GPU.
Adsorption Energy Benchmarking
Data Preparation
CatBench supports three data sources, ordered by convenience.
Option A: Pre-formatted Zenodo Download (Fastest, Recommended)
Five main benchmark datasets from the CatBench publication are hosted as pre-processed JSON files on Zenodo (DOI: 10.5281/zenodo.17157086):
from catbench.adsorption import zenodo_download, list_zenodo_benchmarks
list_zenodo_benchmarks()
# → ['BM_dataset', 'ComerGeneralized2024', 'FG_dataset', 'KHLOHC_origin', 'MamunHighT2019']
zenodo_download("MamunHighT2019") # writes raw_data/MamunHighT2019_adsorption.json
| Benchmark | Size | Description |
|---|---|---|
| MamunHighT2019 | 195 MB | 45,130 small-molecule adsorptions on 2,035 bimetallic alloys |
| FG_dataset | 15 MB | 2,651 C1–C10 organic molecules on transition metals |
| KHLOHC_origin | 11 MB | Liquid organic hydrogen carrier adsorption (fine-tuning) |
| ComerGeneralized2024 | 2 MB | 325 adsorptions on metal oxide surfaces |
| BM_dataset | 0.4 MB | 32 industrial large molecules (biomass, polyurethane, plastics) |
Option B: CatHub Database
For benchmarks not on Zenodo, download and preprocess directly from CatHub:
from catbench.adsorption import cathub_preprocessing
cathub_preprocessing("MamunHighT2019")
# Multiple datasets with adsorbate name unification:
cathub_preprocessing(
["MamunHighT2019", "AraComputational2022"],
adsorbate_integration={"HO": "OH", "O2H": "OOH"},
)
Option C: User VASP Data
Critical: Use
rate=Nonewhen callingAdsorptionCalculationon your own VASP data.The default
rate=0.5fixes the bottom 50% of atoms by z-coordinate and ignores your VASP Selective-Dynamics (T/F) flags. For CatHub/Zenodo data this is what the reference calculations did, but for your own data it silently overrides your physics and produces energies inconsistent with your DFT references. This is the single most common "why don't my MLIP and DFT match?" pitfall.
Warning:
vasp_preprocessingdeletes every file exceptCONTCARandOSZICARto save disk space. Always run it on a copy of your original VASP output.
Organize the data as follows. gas, slab, and the <name>gas pattern are reserved; everything else is arbitrary:
your_dataset_name/
├── gas/
│ ├── H2gas/ { CONTCAR, OSZICAR } # gas-reference folder, must end with "gas"
│ └── H2Ogas/ { CONTCAR, OSZICAR }
├── system_A/
│ ├── slab/ { CONTCAR, OSZICAR } # reserved name (clean surface)
│ ├── H/
│ │ ├── site_0/ { CONTCAR, OSZICAR } # any name for site variants
│ │ └── site_1/ { CONTCAR, OSZICAR }
│ └── OH/
│ └── ...
└── system_B/
└── ...
Declare the reaction stoichiometry and preprocess:
from catbench.adsorption import vasp_preprocessing
coeff_setting = {
"H": {"slab": -1, "adslab": 1, "H2gas": -1/2},
"OH": {"slab": -1, "adslab": 1, "H2gas": +1/2, "H2Ogas": -1},
}
vasp_preprocessing("your_dataset_name", coeff_setting)
# → raw_data/your_dataset_name_adsorption.json
The keys "slab" and "adslab" are required literals on every entry; all other keys are gas-phase references and must end with "gas". vasp_preprocessing validates these rules before deleting anything.
Calculation
AdsorptionCalculation takes a list of calculators. Running the same calculator multiple times provides reproducibility statistics that the analysis step uses for anomaly detection.
from catbench.adsorption import AdsorptionCalculation
from your_mlip import YourCalculator
calc = YourCalculator(...)
AdsorptionCalculation(
[calc] * 3, # 3 reproducibility seeds
mlip_name="YourMLIP", # free-form label — folder name under result/, display name in plots
benchmark="dataset_name",
# rate=None, # REQUIRED for user VASP data (Option C)
# save_files=False, # skip trajectory + log files to save disk space
).run()
D3 dispersion correction (GPU required):
from catbench.dispersion import DispersionCorrection
d3 = DispersionCorrection() # Becke-Johnson damping + PBE by default
calc_d3 = d3.apply(YourCalculator(...))
AdsorptionCalculation([calc_d3] * 3, mlip_name="YourMLIP_D3", benchmark="dataset_name").run()
OC20 mode for MLIPs that predict adsorption energy directly:
AdsorptionCalculation(
[oc20_calc] * 3, mode="oc20", mlip_name="OC20_MLIP", benchmark="dataset_name",
).run()
See the Configuration Reference for all options, and the tutorial notebook for an end-to-end runnable example.
Analysis
from catbench.adsorption import AdsorptionAnalysis
AdsorptionAnalysis().analysis() # auto-detects every MLIP under ./result/
This produces:
- Parity plots under
./plot/<mlip_name>/{mono,multi}/—mono/total.pngaggregates all reactions;multi/total.pngcolors by adsorbate. - Excel report
./{cwd_name}_Benchmarking_Analysis.xlsxwith MAE, RMSE, anomaly breakdown, ADwT, AMDwT, and timings across every MLIP in./result/.
Every data point is classified into Normal, Migration, Energy Anom., Unphys. Relax, or Reprod. Fail. Thresholds are configurable — see the Configuration Reference.
Threshold sensitivity:
AdsorptionAnalysis().threshold_sensitivity_analysis() # displacement + bond-length by default
This generates stacked-area charts showing how anomaly-classification rates change with threshold values.
Output Files
Parity plots
| Mono — all reactions combined | Multi — colored by adsorbate |
Excel report
The Excel workbook has three sheet types. Example numbers from the paper:
Main comparison sheet — one row per MLIP:
| MLIP | Normal (%) | Anomaly (%) | MAE_total (eV) | MAE_normal (eV) | ADwT (%) | AMDwT (%) | Time/step (ms) |
|---|---|---|---|---|---|---|---|
| MLIP_A | 77.25 | 14.39 | 1.118 | 0.316 | 77.98 | 84.71 | 125.3 |
| MLIP_B | 74.22 | 16.84 | 0.667 | 0.512 | 69.66 | 80.80 | 89.7 |
| MLIP_C | 80.18 | 13.51 | 0.917 | 0.241 | 78.97 | 86.79 | 156.8 |
| ... | ... | ... | ... | ... | ... | ... | ... |
Additional sheets (click to expand)
Anomaly breakdown — counts per anomaly category per MLIP:
| MLIP | Normal | Migration | Energy Anom. | Unphys. Relax | Reprod. Fail |
|---|---|---|---|---|---|
| MLIP_A | 34,869 | 3,774 | 590 | 3,845 | 2,052 |
| MLIP_B | 33,503 | 4,035 | 834 | 5,221 | 1,537 |
| ... | ... | ... | ... | ... | ... |
Per-MLIP adsorbate sheets — one sheet per MLIP, one row per adsorbate:
| Adsorbate | Normal | Anomaly | MAE_total (eV) | MAE_normal (eV) | ADwT (%) | AMDwT (%) |
|---|---|---|---|---|---|---|
| H | 1,247 | 89 | 0.891 | 0.234 | 89.3 | 93.4 |
| OH | 1,156 | 124 | 1.045 | 0.298 | 82.7 | 87.1 |
| ... | ... | ... | ... | ... | ... | ... |
Threshold sensitivity charts
| Displacement threshold | Bond-length change threshold |
Relative Energy Benchmarking
Same data → calculation → analysis shape as Adsorption, but with a single calculator (no reproducibility seeds) and a task_type dispatch.
Surface Energy
Warning: Preprocessing deletes all files except
CONTCARandOSZICAR. Always work on a copy.
Layout — per material, one bulk/ and one slab/:
your_surface_data/
├── Material_1/
│ ├── bulk/ { CONTCAR, OSZICAR }
│ └── slab/ { CONTCAR, OSZICAR }
├── Material_2/
│ └── ...
from catbench.relative.surface_energy.data import surface_energy_vasp_preprocessing
from catbench.relative import SurfaceEnergyCalculation, RelativeEnergyAnalysis
surface_energy_vasp_preprocessing("your_surface_data")
SurfaceEnergyCalculation(calculator=calc, benchmark="your_surface_data", mlip_name="MyMLIP").run()
RelativeEnergyAnalysis(task_type="surface").analysis()
The Excel report provides MAE, RMSE, and max error (J/m2) across all surfaces per MLIP.
Bulk Formation Energy
Warning: Preprocessing deletes all files except
CONTCARandOSZICAR.
Layout — bulk_compounds/ and elements/ side-by-side:
your_formation_data/
├── bulk_compounds/
│ ├── Compound_1/ { CONTCAR, OSZICAR }
│ └── Compound_2/ { CONTCAR, OSZICAR }
└── elements/
├── Element_A/ { CONTCAR, OSZICAR }
├── Element_B/ { CONTCAR, OSZICAR }
└── Element_C/ { CONTCAR, OSZICAR }
from catbench.relative.bulk_formation.data import bulk_formation_vasp_preprocessing
from catbench.relative import BulkFormationCalculation, RelativeEnergyAnalysis
coeff_setting = {
"Compound_1": {"bulk": 1, "Element_A": -1, "Element_C": -1/2},
"Compound_2": {"bulk": 1, "Element_B": -2, "Element_C": -3/2},
}
bulk_formation_vasp_preprocessing("your_formation_data", coeff_setting)
BulkFormationCalculation(calculator=calc, benchmark="your_formation_data", mlip_name="MyMLIP").run()
RelativeEnergyAnalysis(task_type="bulk_formation").analysis()
Equation of State (EOS) Benchmarking
Each material has N volume-point subfolders named 0, 1, …, N:
your_eos_data/
├── Material_1/
│ ├── 0/ { CONTCAR, OSZICAR } # smallest volume
│ ├── 1/ { CONTCAR, OSZICAR }
│ └── ... (up to 10, typically)
├── Material_2/
│ └── ...
from catbench.eos import eos_vasp_preprocessing, EOSCalculation, EOSAnalysis
eos_vasp_preprocessing("your_eos_data")
EOSCalculation(calculator=calc, mlip_name="MyMLIP", benchmark="your_eos_data").run()
EOSAnalysis().analysis()
The Excel report includes Birch-Murnaghan fits with bulk modulus (B0), equilibrium volume (V0), and derivative (B0'):
| MLIP | RMSE (eV) | MAE (eV) | VASP B0 (GPa) | MLIP B0 (GPa) | B0 Error (GPa) | VASP V0 (A^3) | MLIP V0 (A^3) | V0 Error (A^3) |
|---|---|---|---|---|---|---|---|---|
| MLIP_A | 0.634 | 0.462 | 80.53 | 102.59 | 22.06 | 475.37 | 469.42 | 5.95 |
| MLIP_B | 0.411 | 0.318 | 80.53 | 72.29 | 8.24 | 475.37 | 478.51 | 3.13 |
| MLIP_C | 0.444 | 0.350 | 80.53 | 88.02 | 7.49 | 475.37 | 470.70 | 4.67 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
Configuration Reference
Options are grouped into Required, Commonly tuned, and Advanced (collapsed). Required parameters must be passed at construction; the rest have sensible defaults and can be overridden as needed.
AdsorptionCalculation
Required
| Parameter | Description | Type |
|---|---|---|
mlip_name |
Free-form label. Used as the folder name under result/ and as the display name in plots and Excel sheets. |
str |
benchmark |
Dataset name; matches raw_data/{benchmark}_adsorption.json. |
str |
Commonly tuned
| Parameter | Description | Default |
|---|---|---|
rate |
Fraction of atoms to fix by z-coordinate. Must be None for user VASP data — see Option C. |
0.5 |
save_files |
If False, skips trajectory + log files to save disk space. | True |
f_crit_relax |
Force convergence criterion (eV/A). | 0.05 |
n_crit_relax |
Max optimization steps per structure. | 999 |
mode |
"basic" (relaxation + references) or "oc20" (direct E_ads prediction). |
"basic" |
Advanced
| Parameter | Description | Default |
|---|---|---|
damping |
Optimization damping factor. | 1.0 |
optimizer |
ASE optimizer: LBFGS / LBFGSLineSearch / BFGS / BFGSLineSearch / GPMin / MDMin / FIRE. | "LBFGS" |
save_step |
Save interval for result.json during long runs. |
50 |
chemical_bond_cutoff |
Cutoff distance for bond-change detection (A). | 6.0 |
AdsorptionAnalysis
Commonly tuned
| Parameter | Description | Default |
|---|---|---|
mlip_list |
Limit analysis to specific MLIPs. | Auto-detect all under ./result/ |
target_adsorbates |
Analyze only these adsorbates. | All |
exclude_adsorbates |
Skip these adsorbates. | None |
disp_thrs |
Displacement anomaly threshold (A). | 0.5 |
energy_thrs |
Energy anomaly threshold (eV). | 2.0 |
reproduction_thrs |
Cross-seed reproducibility threshold (eV). | 0.2 |
bond_length_change_threshold |
Bond-length-change anomaly threshold (fraction). | 0.2 |
energy_cutoff |
Exclude reference energies above this value (eV). | None |
mlip_name_map |
Display-name overrides, e.g. {"MACE-MP-0": "MACE"}. |
{} |
font_setting |
[path_to_ttf, family_name] for custom plot font. |
False |
Advanced — paths, plot styling, font sizes
| Parameter | Description | Default |
|---|---|---|
calculating_path |
Path to results directory. | ./result |
benchmarking_name |
Output file prefix. | CWD name |
time_unit |
"s", "ms", or "us". |
"ms" |
plot_enabled |
Generate plots. | True |
figsize |
Figure size (width, height) in inches. | (9, 8) |
dpi |
Plot DPI. | 300 |
mark_size |
Marker size. | 100 |
linewidths |
Line width. | 1.5 |
min, max |
Plot axis limits. | None |
tick_bins, tick_decimal_places |
Tick control. | 6, 1 |
tick_labelsize |
Tick-label font size. | 25 |
xlabel_fontsize, ylabel_fontsize |
Axis-label font sizes. | 40, 40 |
mae_text_fontsize |
MAE-text font size. | 30 |
legend_fontsize, comparison_legend_fontsize |
Legend font sizes. | 25, 15 |
threshold_xlabel_fontsize, threshold_ylabel_fontsize |
Threshold-plot label sizes. | 40, 40 |
legend_off, mae_text_off, error_bar_display |
Display toggles. | False |
xlabel_off, ylabel_off, grid |
Display toggles. | False |
specific_color |
Single-MLIP plot color. | "#2077B5" |
DispersionCorrection
| Parameter | Description | Default |
|---|---|---|
damping_type |
"damp_bj" (Becke-Johnson, recommended) or "damp_zero". |
"damp_bj" |
functional_name |
DFT functional for D3 parameters (pbe, scan, b3lyp, hse06, ...). | "pbe" |
vdw_cutoff |
van der Waals cutoff (au^2). | 9000 |
cn_cutoff |
Coordination number cutoff (au^2). | 1600 |
Relative energy and EOS classes
SurfaceEnergyCalculation, BulkFormationCalculation, and EOSCalculation all require calculator, benchmark, and mlip_name, and accept f_crit_relax and n_crit_relax for optimization control.
RelativeEnergyAnalysis requires task_type ("surface", "bulk_formation", or "custom") and accepts the same plotting options as AdsorptionAnalysis.
EOSAnalysis advanced options
| Parameter | Description | Default |
|---|---|---|
calculating_path |
Results directory. | ./result |
plot_path |
Plot output directory. | ./plot |
benchmark |
Dataset name. | CWD name |
mlip_list |
MLIPs to analyze. | Auto-detect |
figsize |
Plot dimensions. | (9, 8) |
dpi |
Plot DPI. | 300 |
mark_size |
Marker size. | 100 |
x_tick_bins, y_tick_bins |
Tick bins. | 5, 5 |
tick_decimal_places, tick_labelsize |
Tick control. | 1, 25 |
xlabel_fontsize, ylabel_fontsize |
Axis-label font sizes. | 40, 40 |
legend_fontsize, comparison_legend_fontsize |
Legend font sizes. | 25, 15 |
grid |
Show grid. | False |
font_setting |
Custom font [path, family]. |
False |
Citation
@article{catbench2025,
title={CatBench Framework for Benchmarking Machine Learning Interatomic Potentials in Adsorption Energy Predictions for Heterogeneous Catalysis},
author={Moon, Jinuk and Jeon, Uchan and Choung, Seokhyun and Han, Jeong Woo},
journal={Cell Reports Physical Science},
volume={6},
pages={102968},
year={2025},
doi={10.1016/j.xcrp.2025.102968}
}
License
MIT — see LICENSE.
Contact
Jinuk Moon · jumoon@snu.ac.kr
Jeong Woo Han · jwhan98@snu.ac.kr
Seoul National University
For bug reports, feature requests, and contributions: GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catbench-1.0.1.tar.gz.
File metadata
- Download URL: catbench-1.0.1.tar.gz
- Upload date:
- Size: 11.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75fe6bbf61c712f187d39cf04bb12390a18eb826f12e704aa897cf9905472b3d
|
|
| MD5 |
cfa9460ba46499989b5994b13bb7be73
|
|
| BLAKE2b-256 |
9d5d78ee577d1d71decd827c46d21a26f5e53b11607ef513e92ea671bfd4aff8
|
Provenance
The following attestation bundles were made for catbench-1.0.1.tar.gz:
Publisher:
publish.yml on JinukMoon/CatBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
catbench-1.0.1.tar.gz -
Subject digest:
75fe6bbf61c712f187d39cf04bb12390a18eb826f12e704aa897cf9905472b3d - Sigstore transparency entry: 1340424122
- Sigstore integration time:
-
Permalink:
JinukMoon/CatBench@73b2af7db111a3eeea6553052c276dd968253d23 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/JinukMoon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@73b2af7db111a3eeea6553052c276dd968253d23 -
Trigger Event:
push
-
Statement type:
File details
Details for the file catbench-1.0.1-py3-none-any.whl.
File metadata
- Download URL: catbench-1.0.1-py3-none-any.whl
- Upload date:
- Size: 517.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89dc0926825367ef7a0a4aa4f0ad77ab9c3dde667cb71f4aa00dc8bd6e93d4f7
|
|
| MD5 |
b611fc753955ef2808b6b4d39cf7bf9d
|
|
| BLAKE2b-256 |
f5e3f21c7a64e912aced08cb89e5073f79d6aa18610fbdc647b39c61c7589bbf
|
Provenance
The following attestation bundles were made for catbench-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on JinukMoon/CatBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
catbench-1.0.1-py3-none-any.whl -
Subject digest:
89dc0926825367ef7a0a4aa4f0ad77ab9c3dde667cb71f4aa00dc8bd6e93d4f7 - Sigstore transparency entry: 1340424124
- Sigstore integration time:
-
Permalink:
JinukMoon/CatBench@73b2af7db111a3eeea6553052c276dd968253d23 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/JinukMoon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@73b2af7db111a3eeea6553052c276dd968253d23 -
Trigger Event:
push
-
Statement type: