CatBench: Benchmark of Machine Learning Potentials for Adsorption Energy Predictions in Heterogeneous Catalysis
Project description
CatBench
CatBench: Benchmark Framework for Machine Learning Potentials in Adsorption Energy Predictions
Installation
pip install catbench
Overview
CatBench is a comprehensive benchmarking framework designed to evaluate Machine Learning Potentials (MLPs) for adsorption energy predictions. It provides tools for data processing, model evaluation, and result analysis.
Usage Workflow
1. Data Processing
CatBench supports two types of data sources:
A. Direct from Catalysis-Hub
# Import the catbench package
import catbench
# Process data from Catalysis-Hub
# Single tag
catbench.cathub_preprocess("Catalysis-Hub_Dataset_tag")
# Multiple tags
catbench.cathub_preprocess(["Catalysis-Hub_Dataset_tag1", "Catalysis-Hub_Dataset_tag2"])
Example:
# Single tag example
catbench.cathub_preprocess("AraComputational2022")
# Multiple tags example
catbench.cathub_preprocess(["AraComputational2022", "AlonsoStrain2023"])
B. User Dataset
For custom datasets, prepare your data structure as follows:
The data structure should include:
- Gas references (
gas/) containing VASP output files for gas phase molecules - Surface structures (
surface1/,surface2/, etc.) containing:- Clean slab calculations (
slab/) - Adsorbate-surface systems (
H/,OH/, etc.)
- Clean slab calculations (
Note: Each directory must contain CONTCAR and OSZICAR files. Other VASP output files can be present as well - the process_output function will automatically clean up (delete) all files except CONTCAR and OSZICAR.
data/
├── gas/
│ ├── H2gas/
│ │ ├── CONTCAR
│ │ └── OSZICAR
│ └── H2Ogas/
│ ├── CONTCAR
│ └── OSZICAR
├── surface1/
│ ├── slab/
│ │ ├── CONTCAR
│ │ └── OSZICAR
│ ├── H/
│ │ ├── 1/
│ │ │ ├── CONTCAR
│ │ │ └── OSZICAR
│ │ └── 2/
│ │ ├── CONTCAR
│ │ └── OSZICAR
│ └── OH/
│ ├── 1/
│ │ ├── CONTCAR
│ │ └── OSZICAR
│ └── 2/
│ ├── CONTCAR
│ └── OSZICAR
└── surface2/
├── slab/
│ ├── CONTCAR
│ └── OSZICAR
├── H/
│ ├── 1/
│ │ ├── CONTCAR
│ │ └── OSZICAR
│ └── 2/
│ ├── CONTCAR
│ └── OSZICAR
└── OH/
├── 1/
│ ├── CONTCAR
│ └── OSZICAR
└── 2/
├── CONTCAR
└── OSZICAR
Then process using:
import catbench
# Define coefficients for calculating adsorption energies
# For each adsorbate, specify coefficients based on the reaction equation:
# Example for H*:
# E_ads(H*) = E(H*) - E(slab) - 1/2 E(H2_gas)
# Example for OH*:
# E_ads(OH*) = E(OH*) - E(slab) + 1/2 E(H2_gas) - E(H2O_gas)
coeff_setting = {
"H": {
"slab": -1, # Coefficient for clean surface
"adslab": 1, # Coefficient for adsorbate-surface system
"H2gas": -1/2, # Coefficient for H2 gas reference
},
"OH": {
"slab": -1, # Coefficient for clean surface
"adslab": 1, # Coefficient for adsorbate-surface system
"H2gas": +1/2, # Coefficient for H2 gas reference
"H2Ogas": -1, # Coefficient for H2O gas reference
},
}
# This will clean up directories and keep only CONTCAR and OSZICAR files
catbench.process_output("data", coeff_setting)
catbench.userdata_preprocess("data")
2. Execute Benchmark
A. General Benchmark
This is a general benchmark setup. The range() value determines the number of repetitions for reproducibility testing. If reproducibility testing is not needed, it can be set to 1.
Note: This benchmark is only compatible with MLP models that output total system energy. For example, OC20 MLP models that are trained to directly predict adsorption energies cannot be used with this framework.
import catbench
from your_calculator import Calculator
# Prepare calculator list
# range(5): Run 5 times for reproducibility testing
# range(1): Single run when reproducibility testing is not needed
calculators = [Calculator() for _ in range(5)]
config = {}
catbench.execute_benchmark(calculators, **config)
After execution, the following files and directories will be created:
- A
resultdirectory is created to store all calculation outputs. - Inside the
resultdirectory, subdirectories are created for each MLP. - Each MLP's subdirectory contains:
gases/: Gas reference molecules for adsorption energy calculationslog/: Slab and adslab calculation logstraj/: Slab and adslab trajectory files{MLP_name}_gases.json: Gas molecules energies{MLP_name}_anomaly_detection.json: Anomaly detection status for each adsorption data{MLP_name}_result.json: Raw data (energies, calculation times, anomaly detection, slab displacements, etc.)
B. OC20 MLP Benchmark
Since OC20 project MLP models are trained to predict adsorption energies directly rather than total energies, they are handled with a separate function.
import catbench
from your_calculator import Calculator
# Prepare calculator list
# range(5): Run 5 times for reproducibility testing
# range(1): Single run when reproducibility testing is not needed
calculators = [Calculator() for _ in range(5)]
config = {}
catbench.execute_benchmark_OC20(calculators, **config)
The overall usage is similar to the general benchmark, but each MLP will only have the following subdirectories:
log/: Slab and adslab calculation logstraj/: Slab and adslab trajectory files{MLP_name}_anomaly_detection.json: Anomaly detection status for each adsorption data{MLP_name}_result.json: Raw data (energies, calculation times, anomaly detection, slab displacements, etc.)
C. Single-point Calculation Benchmark
import catbench
from your_calculator import Calculator
calculator = Calculator()
config = {}
catbench.execute_benchmark_single(calculator, **config)
3. Analysis
import catbench
config = {}
catbench.analysis_MLPs(**config)
The analysis function processes the calculation data stored in the result directory and generates:
-
A
plot/directory:- Parity plots for each MLP model
- Combined parity plots for comparison
- Performance visualization plots
-
An Excel file
{dataset_name}_Benchmarking_Analysis.xlsx:- Comprehensive performance metrics for all MLP models
- Statistical analysis of predictions
- Model-specific details and parameters
Single-point Calculation Analysis
import catbench
config = {}
catbench.analysis_MLPs_single(**config)
Outputs
1. Adsorption Energy Parity Plot (mono_version & multi_version)
You can plot adsorption energy parity plots for each adsorbate across all MLPs, either simply or by adsorbate.
2. Comprehensive Performance Table
View various metrics for all MLPs.
3. Anomaly Analysis
See how anomalies are detected for all MLPs.
4. Analysis by Adsorbate
Observe how each MLP predicts for each adsorbate.
Configuration Options
execute_benchmark / execute_benchmark_OC20
| Option | Description | Default |
|---|---|---|
| MLP_name | Name of your MLP | Required |
| benchmark | Name of benchmark dataset. Use "multiple_tag" for combined datasets, or specific tag name for single dataset | Required |
| F_CRIT_RELAX | Force convergence criterion | 0.05 |
| N_CRIT_RELAX | Maximum number of steps | 999 |
| rate | Fix ratio for surface atoms (0: use original constraints, >0: fix atoms from bottom up to specified ratio) | 0.5 |
| disp_thrs_slab | Displacement threshold for slab | 1.0 |
| disp_thrs_ads | Displacement threshold for adsorbate | 1.5 |
| again_seed | Seed variation threshold | 0.2 |
| damping | Damping factor for optimization | 1.0 |
| gas_distance | Cell size for gas molecules | 10 |
| optimizer | Optimization algorithm | "LBFGS" |
execute_benchmark_single
| Option | Description | Default |
|---|---|---|
| MLP_name | Name of your MLP | Required |
| benchmark | Name of benchmark dataset. Use "multiple_tag" for combined datasets, or specific tag name for single dataset | Required |
| gas_distance | Cell size for gas molecules | 10 |
| optimizer | Optimization algorithm for gas molecule relaxation | "LBFGS" |
analysis_MLPs
| Option | Description | Default |
|---|---|---|
| Benchmarking_name | Name for output files | Current directory name |
| calculating_path | Path to result directory | "./result" |
| MLP_list | List of MLPs to analyze | All MLPs in result directory |
| target_adsorbates | Target adsorbates to analyze | All adsorbates |
| specific_color | Color for plots | "black" |
| min | Axis minimum | Auto-calculated |
| max | Axis maximum | Auto-calculated |
| figsize | Figure size | (9, 8) |
| mark_size | Marker size | 100 |
| linewidths | Line width | 1.5 |
| dpi | Plot resolution | 300 |
| legend_off | Toggle legend | False |
| error_bar_display | Toggle error bars | False |
| font_setting | Font setting (Eg: ["/Users/user/Library/Fonts/Helvetica.ttf", "sans-serif"]) |
False |
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
This work will be published soon.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catbench-0.1.26.tar.gz.
File metadata
- Download URL: catbench-0.1.26.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58e9a00e1a102ba7c9e6b72714d5d54f88c587e8bd318b771d7e744a43f149e
|
|
| MD5 |
a70413a79ad18659cc1bc9d5bfcfaab8
|
|
| BLAKE2b-256 |
aba5c61d84abd0fe309f61e643713c70fdfdf59d4480fe5ed50c2a555e0001f4
|
Provenance
The following attestation bundles were made for catbench-0.1.26.tar.gz:
Publisher:
publish.yml on JinukMoon/CatBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
catbench-0.1.26.tar.gz -
Subject digest:
c58e9a00e1a102ba7c9e6b72714d5d54f88c587e8bd318b771d7e744a43f149e - Sigstore transparency entry: 184033409
- Sigstore integration time:
-
Permalink:
JinukMoon/CatBench@fd8f8affaef26f03e55c7f9a95c314a29b566e77 -
Branch / Tag:
refs/tags/v0.1.26 - Owner: https://github.com/JinukMoon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fd8f8affaef26f03e55c7f9a95c314a29b566e77 -
Trigger Event:
push
-
Statement type:
File details
Details for the file catbench-0.1.26-py3-none-any.whl.
File metadata
- Download URL: catbench-0.1.26-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4ab09ceae656ad1ede92e199d30b30b8a0356e5751baf3a0707cb08a6330284
|
|
| MD5 |
0b9a95b3fe51a2c22390a02719ff7c54
|
|
| BLAKE2b-256 |
9d13016ba1d8c580e2c84c9f0c61cbd22aadbfcd5cb1a8e55712da57e3a43601
|
Provenance
The following attestation bundles were made for catbench-0.1.26-py3-none-any.whl:
Publisher:
publish.yml on JinukMoon/CatBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
catbench-0.1.26-py3-none-any.whl -
Subject digest:
d4ab09ceae656ad1ede92e199d30b30b8a0356e5751baf3a0707cb08a6330284 - Sigstore transparency entry: 184033412
- Sigstore integration time:
-
Permalink:
JinukMoon/CatBench@fd8f8affaef26f03e55c7f9a95c314a29b566e77 -
Branch / Tag:
refs/tags/v0.1.26 - Owner: https://github.com/JinukMoon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fd8f8affaef26f03e55c7f9a95c314a29b566e77 -
Trigger Event:
push
-
Statement type: