CatBench: Benchmark Framework of Machine Learning Interatomic Potentials for Adsorption Energy Predictions in Heterogeneous Catalysis

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

jumoon

These details have not been verified by PyPI

Project description

CatBench

CatBench: Benchmark Framework of Machine Learning Interatomic Potentials in Adsorption Energy Predictions

Installation

pip install catbench

Overview

CatBench Schematic CatBench is a comprehensive benchmark framework designed to evaluate Machine Learning Interatomic Potentials (MLIPs) for adsorption energy or other reaction energy predictions. It provides tools for data processing, model evaluation, and result analysis.

Usage Workflow

1. Data Processing

CatBench supports two types of data sources:

A. Direct from Catalysis-Hub
B. User-calculated VASP Dataset

A. Direct from Catalysis-Hub

# Import the catbench package
import catbench

# Process data from Catalysis-Hub
# Single tag
catbench.cathub_preprocess("Catalysis-Hub_Dataset_tag")

# Multiple tags
catbench.cathub_preprocess(["Catalysis-Hub_Dataset_tag1", "Catalysis-Hub_Dataset_tag2"])

Example:

# Single tag example
catbench.cathub_preprocess("AraComputational2022")

# Multiple tags example
catbench.cathub_preprocess(["AraComputational2022", "AlonsoStrain2023"])

When combining multiple benchmarks, the same adsorbate species might be recognized differently due to variations in naming conventions across different datasets (e.g., *HO vs *OH for hydroxyl group). To address this issue, you can use the adsorbate_integration parameter to unify these different naming conventions. If no integration is needed, you can simply use the benchmark_name parameter alone:

# When no integration is needed, just use benchmark_name
catbench.cathub_preprocess(["Catalysis-Hub_Dataset_tag1", "Catalysis-Hub_Dataset_tag2"])

# When integration is needed
catbench.cathub_preprocess(
    ["Catalysis-Hub_Dataset_tag1", "Catalysis-Hub_Dataset_tag2"],
    adsorbate_integration={'HO': 'OH'}
)

# You can add multiple integration pairs
catbench.cathub_preprocess(
    ["Catalysis-Hub_Dataset_tag1", "Catalysis-Hub_Dataset_tag2"],
    adsorbate_integration={
        'HO': 'OH',
        'O2H': 'OOH',
        'CO2H': 'COOH'
    }
)

B. User-calculated VASP Dataset

For CatBench simulation on your VASP datasets, prepare your data hierarchy as follows:

The data structure should include:

Gas references (gas/) containing VASP output files for gas phase molecules
- Note: Gas molecule folders must end with 'gas' (e.g., H2gas/, H2Ogas/)
Surface systems (system1/, system2/, etc.) containing:
- Each system represents a collection of reaction energies based on the same slab (e.g., system1/ for Pt111, system2/ for Ni111)
- Clean slab calculations (slab/)
- Adsorbate-surface systems organized by adsorbate type (H/, OH/, etc.)
  - Under each adsorbate directory, you can create subdirectories with any names to represent different configurations
  - Each configuration directory should contain the VASP output files

Important Notes:

Each directory must contain CONTCAR and OSZICAR files. Note that other VASP output files will be deleted during processing, so please ensure your original files are preserved.
When using process_output function, it will automatically clean up (delete) all files except CONTCAR and OSZICAR. Therefore, it is strongly recommended to:
- Keep your original data folder untouched
- Create a copy of your data folder
- Run process_output on the copied folder
When benchmarking on user dataset, you must set rate=0 in execute_benchmark function to preserve the original atomic constraints from your calculations.

data/
├── gas/
│   ├── H2gas/
│   │   ├── CONTCAR
│   │   ├── OSZICAR
│   │   └── ...
│   └── H2Ogas/
│       ├── CONTCAR
│       ├── OSZICAR
│       └── ...
├── system1/ (e.g., Pt111/)
│   ├── slab/
│   │   ├── CONTCAR
│   │   ├── OSZICAR
│   │   └── ...
│   ├── H/
│   │   ├── 1/
│   │   │   ├── CONTCAR
│   │   │   ├── OSZICAR
│   │   │   └── ...
│   │   └── 2/
│   │       ├── CONTCAR
│   │       ├── OSZICAR
│   │       └── ...
│   └── OH/
│       ├── 1/
│       │   ├── CONTCAR
│       │   ├── OSZICAR
│   │   └── ...
│       └── 2/
│           ├── CONTCAR
│           ├── OSZICAR
│           └── ...
└── system2/ (e.g., Ni111/)
    ├── slab/
    │   ├── CONTCAR
    │   ├── OSZICAR
    │   └── ...
    ├── H/
    │   ├── 1/
    │   │   ├── CONTCAR
    │   │   ├── OSZICAR
    │   │   └── ...
    │   └── 2/
    │       ├── CONTCAR
    │       ├── OSZICAR
    │       └── ...
    └── OH/
        ├── 1/
        │   ├── CONTCAR
        │   ├── OSZICAR
        │   └── ...
        └── 2/
            ├── CONTCAR
            ├── OSZICAR
            └── ...

Then process using:

import catbench

# Define coefficients for calculating adsorption energies
# For each adsorbate, specify coefficients based on the reaction equation:
# Example for H*: 
#   E_ads(H*) = E(H*) - E(slab) - 1/2 E(H2_gas)
# Example for OH*:
#   E_ads(OH*) = E(OH*) - E(slab) + 1/2 E(H2_gas) - E(H2O_gas)

coeff_setting = {
    "H": {
        "slab": -1,      # Coefficient for clean surface
        "adslab": 1,     # Coefficient for adsorbate-surface system
        "H2gas": -1/2,   # Coefficient for H2 gas reference
    },
    "OH": {
        "slab": -1,      # Coefficient for clean surface
        "adslab": 1,     # Coefficient for adsorbate-surface system
        "H2gas": +1/2,   # Coefficient for H2 gas reference
        "H2Ogas": -1,    # Coefficient for H2O gas reference
    },
}

# This will clean up directories and keep only CONTCAR and OSZICAR files
catbench.process_output("data", coeff_setting)
catbench.userdata_preprocess("data")

The coefficient setting allows flexible definition of reaction energies, enabling benchmarking of various types of reactions beyond adsorption.

For example, you can benchmark the prediction performance for oxygen vacancy formation energy as follows:

# Example: Oxygen vacancy formation energy calculation
coeff_setting = {
    "Ov": {
        "slab": -1,        # Coefficient for clean surface
        "adslab": 1,       # Coefficient for slab with oxygen vacancy
        "O2gas": 1/2,      # Coefficient for O2 gas reference (vacancy formation)
    }
}

# This will clean up directories and keep only CONTCAR and OSZICAR files
catbench.process_output("data", coeff_setting)
catbench.userdata_preprocess("data")

2. Execute Benchmark

A. General Benchmark

This is a general benchmark setup. The range() value determines the number of repetitions for reproducibility testing. If reproducibility testing is not needed, it can be set to 1.

Note: This benchmark is only compatible with MLIP models that output total system energy. For example, OC20 MLIP models that are trained to directly predict adsorption energies cannot be used with this framework.

import catbench
from your_calculator import your_MLIP_Calculator

# Prepare calculator list

calc_num = 5 # Number of calculations for reproducibility testing. Can be adjusted based on available computational resources.

calculators = []
print("Calculators Initializing...")
for i in range(calc_num):
    print(f"{i}th calculator")
    calc = your_MLIP_Calculator(...)
    calculators.append(calc)

config = {
    "MLIP_name": "your MLIP name", # Required: Name of your MLIP model (e.g., "MACE", "CHGNet", "UMA", "yourmodel_w_dataset1", "yourmodel_tuned_1"). You can use any abbreviation that identifies your model.
    "benchmark": "your benchmark dataset pkl name", # Required: Name of the .pkl file in the raw_data directory
    "rate": 0.5, # Optional: For cathub data, can be any value. For user VASP data, must be set to 0
    ... # For detailed configuration options, see the Configuration Options section at the bottom of this document.
}

catbench.execute_benchmark(calculators, **config)

After execution, the following files and directories will be created:

A result directory is created to store all calculation outputs.
Inside the result directory, subdirectories are created for each MLIP.
Each MLIP's subdirectory contains:
- gases/: Gas reference molecules for adsorption energy calculations
- log/: Slab and adslab calculation logs
- traj/: Slab and adslab trajectory files
- {MLIP_name}_gases.json: Gas molecules energies
- {MLIP_name}_anomaly_detection.json: Anomaly detection status for each adsorption data
- {MLIP_name}_result.json: Raw data (energies, calculation times, anomaly detection, slab displacements, etc.)

B. OC20 MLIP Benchmark

Since OC20 project MLIP models are trained to predict adsorption energies directly rather than total energies, they are handled with a separate function.

import catbench
from your_calculator import your_MLIP_Calculator

# Prepare calculator list

calc_num = 5 # Number of calculations for reproducibility testing. Can be adjusted based on available computational resources.

calculators = []
print("Calculators Initializing...")
for i in range(calc_num):
    print(f"{i}th calculator")
    calc = your_MLIP_Calculator(...)
    calculators.append(calc)

config = {
    "MLIP_name": "your MLIP name", # Required: Name of your MLIP model (e.g., "MACE", "CHGNet", "UMA", "yourmodel_w_dataset1", "yourmodel_tuned_1"). You can use any abbreviation that identifies your model.
    "benchmark": "your benchmark dataset pkl name", # Required: Name of the .pkl file in the raw_data directory
    "rate": 0.5, # Optional: For cathub data, can be any value. For user VASP data, must be set to 0
    ... # For detailed configuration options, see the Configuration Options section at the bottom of this document.
}

catbench.execute_benchmark_OC20(calculators, **config)

The overall usage is similar to the general benchmark, but each MLIP will only have the following subdirectories:

log/: Slab and adslab calculation logs
traj/: Slab and adslab trajectory files
{MLIP_name}_anomaly_detection.json: Anomaly detection status for each adsorption data
{MLIP_name}_result.json: Raw data (energies, calculation times, anomaly detection, slab displacements, etc.)

C. Single-point Calculation Benchmark

import catbench
from your_calculator import your_MLIP_Calculator

# Prepare calculator

print("Calculators Initializing...")
calc = your_MLIP_Calculator(...)

config = {
    "MLIP_name": "your MLIP name", # Required: Name of your MLIP model (e.g., "MACE", "CHGNet", "UMA", "yourmodel_w_dataset1", "yourmodel_tuned_1"). You can use any abbreviation that identifies your model.
    "benchmark": "your benchmark dataset pkl name", # Required: Name of the .pkl file in the raw_data directory
    ... # For detailed configuration options, see the Configuration Options section at the bottom of this document.
}
catbench.execute_benchmark_single(calc, **config)

3. Analysis

import catbench

config = {
    ... # For detailed configuration options, see the Configuration Options section at the bottom of this document.
}
catbench.analysis_MLIPs(**config)

The analysis function processes the calculation data stored in the result directory and generates:

A plot/ directory:
- Parity plots for each MLIP model
- Combined parity plots for comparison
- Performance visualization plots
An Excel file {directory_name}_Benchmarking_Analysis.xlsx:
- Comprehensive performance metrics for all MLIP models
- Statistical analysis of predictions
- Model-specific details and parameters

Single-point Calculation Analysis

import catbench

config = {
    ... # For detailed configuration options, see the Configuration Options section at the bottom of this document.
}
catbench.analysis_MLIPs_single(**config)

Outputs

1. Adsorption Energy Parity Plot (mono_version & multi_version)

You can plot adsorption energy parity plots for each adsorbate across all MLIPs, either simply or by adsorbate.

2. Comprehensive Performance Table

View various metrics for all MLIPs. Comparison Table

3. Anomaly Analysis

See how anomalies are detected for all MLIPs. Comparison Table

4. Analysis by Adsorbate

Observe how each MLIP predicts for each adsorbate. Comparison Table

Configuration Options

execute_benchmark / execute_benchmark_OC20

Option	Description	Default
MLIP_name	Name of your MLIP	Required
benchmark	Name of benchmark dataset. Use "multiple_tag" for combined datasets, or specific tag name for single dataset	Required
F_CRIT_RELAX	Force convergence criterion	0.05
N_CRIT_RELAX	Maximum number of steps	999
rate	Fix ratio for surface atoms (0: use original constraints, >0: fix atoms from bottom up to specified ratio)	0.5
disp_thrs_slab	Displacement threshold for slab	1.0
disp_thrs_ads	Displacement threshold for adsorbate	1.5
again_seed	Seed variation threshold	0.2
damping	Damping factor for optimization	1.0
gas_distance	Cell size for gas molecules (if a number is provided, it sets the cell size as a cube with that length (Å))	False
optimizer	Optimization algorithm	"LBFGS"
restart	Set to True when resuming interrupted calculations.	False

execute_benchmark_single

Option	Description	Default
MLIP_name	Name of your MLIP	Required
benchmark	Name of benchmark dataset. Use "multiple_tag" for combined datasets, or specific tag name for single dataset	Required
gas_distance	Cell size for gas molecules (if a number is provided, it sets the cell size as a cube with that length (Å))	False
optimizer	Optimization algorithm for gas molecule relaxation	"LBFGS"
restart	Set to True when resuming interrupted calculations.	False

analysis_MLIPs

Option	Description	Default
Benchmarking_name	Name for output files	Current directory name
calculating_path	Path to result directory	"./result"
MLIP_list	List of MLIPs to analyze	All MLIPs in result directory
target_adsorbates	Target adsorbates to analyze	All adsorbates
specific_color	Color for plots	"black"
min	Axis minimum	Auto-calculated
max	Axis maximum	Auto-calculated
figsize	Figure size	(9, 8)
mark_size	Marker size	100
linewidths	Line width	1.5
dpi	Plot resolution	300
legend_off	Toggle legend	False
error_bar_display	Toggle error bars	False
font_setting	Font setting (Eg: `["/Users/user/Library/Fonts/Helvetica.ttf", "sans-serif"]`)	False

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

This work will be published soon.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

jumoon

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.1

Apr 19, 2026

1.0.0

Sep 21, 2025

This version

0.1.32

Jun 11, 2025

0.1.31

May 20, 2025

0.1.30

May 20, 2025

0.1.29

May 19, 2025

0.1.28

Mar 19, 2025

0.1.27

Mar 19, 2025

0.1.26

Mar 18, 2025

0.1.25

Mar 18, 2025

0.1.24

Mar 6, 2025

0.1.23

Feb 21, 2025

0.1.22

Feb 21, 2025

0.1.21

Feb 21, 2025

0.1.20

Jan 15, 2025

0.1.19

Dec 19, 2024

0.1.18

Dec 5, 2024

0.1.17

Dec 2, 2024

0.1.15

Nov 11, 2024

0.1.14

Nov 10, 2024

0.1.13

Nov 7, 2024

0.1.12

Oct 8, 2024

0.1.11

Oct 7, 2024

0.1.10

Oct 6, 2024

0.1.9

Sep 10, 2024

0.1.8

Sep 10, 2024

0.1.7

Sep 10, 2024

0.1.6

Sep 4, 2024

0.1.5

Aug 16, 2024

0.1.4

Aug 7, 2024

0.1.3

Aug 7, 2024

0.1.2

Aug 5, 2024

0.1.1

Aug 5, 2024

0.1.0

Aug 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catbench-0.1.32.tar.gz (24.4 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

catbench-0.1.32-py3-none-any.whl (20.3 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file catbench-0.1.32.tar.gz.

File metadata

Download URL: catbench-0.1.32.tar.gz
Upload date: Jun 11, 2025
Size: 24.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for catbench-0.1.32.tar.gz
Algorithm	Hash digest
SHA256	`8008faf6c32f2a61d881e374559fbe64a634ea282ab090749b624359452d2481`
MD5	`6f56be8d3de402336677dae1619d21b6`
BLAKE2b-256	`a0b533079955eeb4a841ef8b748254d85369b6f298533bf17b9c75cfe30751fd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for catbench-0.1.32.tar.gz:

Publisher: publish.yml on JinukMoon/CatBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: catbench-0.1.32.tar.gz
- Subject digest: 8008faf6c32f2a61d881e374559fbe64a634ea282ab090749b624359452d2481
- Sigstore transparency entry: 235072891
- Sigstore integration time: Jun 11, 2025
Source repository:
- Permalink: JinukMoon/CatBench@a02c225f2c34aed68d59266925f18c1ff0731579
- Branch / Tag: refs/tags/v0.1.32
- Owner: https://github.com/JinukMoon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a02c225f2c34aed68d59266925f18c1ff0731579
- Trigger Event: push

File details

Details for the file catbench-0.1.32-py3-none-any.whl.

File metadata

Download URL: catbench-0.1.32-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for catbench-0.1.32-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a1cde56089e81c456d3782605203c4855f41b1c2b25e65cd1b92ccbbd6391d6`
MD5	`02fec8576d0a7ed65f6491f26ebc6777`
BLAKE2b-256	`dbd1884c702fbd3c6e4bb22d17e6a5585fd43b9baa4cf2ca12a3b156d6d3e013`

See more details on using hashes here.

Provenance

The following attestation bundles were made for catbench-0.1.32-py3-none-any.whl:

Publisher: publish.yml on JinukMoon/CatBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: catbench-0.1.32-py3-none-any.whl
- Subject digest: 2a1cde56089e81c456d3782605203c4855f41b1c2b25e65cd1b92ccbbd6391d6
- Sigstore transparency entry: 235072894
- Sigstore integration time: Jun 11, 2025
Source repository:
- Permalink: JinukMoon/CatBench@a02c225f2c34aed68d59266925f18c1ff0731579
- Branch / Tag: refs/tags/v0.1.32
- Owner: https://github.com/JinukMoon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a02c225f2c34aed68d59266925f18c1ff0731579
- Trigger Event: push

catbench 0.1.32

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

CatBench

Installation

Overview

Usage Workflow

1. Data Processing

A. Direct from Catalysis-Hub

B. User-calculated VASP Dataset

2. Execute Benchmark

A. General Benchmark

B. OC20 MLIP Benchmark

C. Single-point Calculation Benchmark

3. Analysis

Single-point Calculation Analysis

Outputs

1. Adsorption Energy Parity Plot (mono_version & multi_version)

2. Comprehensive Performance Table

3. Anomaly Analysis

4. Analysis by Adsorbate

Configuration Options

execute_benchmark / execute_benchmark_OC20

execute_benchmark_single

analysis_MLIPs

License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance