Skip to main content

Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks

Project description

CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks

summary

CatTSunami is a framework for high-throughput enumeration of nudged elastic band (NEB) frame sets. It was built for use with machine learned (ML) models trained on OC20, which were demonstrated to be performant on this auxiliary task. To train your own model or obtain pre-trained checkpoints, please see fairchem-core.

This repository contains the validation dataset, framework for enumeration, and accompanying code to run ML-accelerated NEBs and validate new models. For more information, please read the manuscript paper.

Getting started

Configured for use:

  1. Install fairchem-core and fairchem-data-oc instructions
  2. Pip innstall fairchem-applications-cattsunami
  3. Check out the tutorial notebook
pip install fairchem-applications-cattsunami

Configured for local development:

  1. Clone the fairchem repo
  2. Install fairchem-data-oc and fairchem-core: instructions
  3. Install this repository pip install -e packages/fairchem-applications-cattsunami
  4. Check out the tutorial notebook

Validation Dataset

The validation dataset is comprised of 932 converged DFT NEB calculations to assess model performance on this important task. There are 3 different reaction classes considered: desorptions, dissociations, and transfers. There were 2827 total DFT NEBS performed including those that failed to converge. Unconverged systems have also been included in ASE All Trajectories below. For more information about the converged dataset see the dataset markdown file.

Splits Size of compressed version (in bytes) Size of uncompressed version (in bytes) MD5 checksum (download link)
ASE Converged Trajectories 1.5G 6.3G 52af34a93758c82fae951e52af445089
ASE All Trajectories 6.7G 30G f5829eeaf7219c5cd3cfb499b8d951da

Citing this work

If you use this codebase in your work, please consider citing:

@article{wander2024cattsunami,
  title={CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks},
  author={Wander, Brook and Shuaibi, Muhammed and Kitchin, John R and Ulissi, Zachary W and Zitnick, C Lawrence},
  journal={arXiv preprint arXiv:2405.02078},
  year={2024}
}

File Structure and Contents

The tar file contains 3 subdirectories: dissociations, desorptions, and transfers. As the names imply, these directories contain the converged DFT trajectories for each of the reaction classes. Within these directories, the trajectories are named to identify the contents of the file. Here is an example and the anatomy of the name:

desorption_id_83_2409_9_111-4_neb1.0.traj

  1. desorption indicates the reaction type (dissociation and transfer are the other possibilities)
  2. id identifies that the material belongs to the validation in domain split (ood - out of domain is th e other possibility)
  3. 83 is the task id. This does not provide relavent information
  4. 2409 is the bulk index of the bulk used in the ocdata bulk pickle file
  5. 9 is the reaction index. for each reaction type there is a reaction pickle file in the repository. In this case it is the 9th entry to that pickle file
  6. 111-4 the first 3 numbers are the miller indices (i.e. the (1,1,1) surface), and the last number cooresponds to the shift value. In this case the 4th shift enumerated was the one used.
  7. neb1.0 the number here indicates the k value used. For the full dataset, 1.0 was used so this does not distiguish any of the trajectories from one another.

The content of these trajectory files is the repeating frame sets. Despite the initial and final frames not being optimized during the NEB, the initial and final frames are saved for every iteration in the trajectory. For the dataset, 10 frames were used - 8 which were optimized over the neb. So the length of the trajectory is the number of iterations (N) * 10. If you wanted to look at the frame set prior to optimization and the optimized frame set, you could get them like this:

from ase.io import read

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
unrelaxed_frames = traj[0:10]
relaxed_frames = traj[-10:]

Use

One more note: We have not prepared an lmdb for this dataset. This is because it is NEB calculations are not supported directly in ocp. You must use the ase native OCP class along with ase infrastructure to run NEB calculations. Here is an example of a use:

from ase.io import read
from ase.optimize import BFGS
from ocpneb.core.ocpneb import OCPNEB

traj = read("desorption_id_83_2409_9_111-4_neb1.0.traj", ":")
neb_frames = traj[0:10]
neb = OCPNEB(
    neb_frames,
    checkpoint_path=YOUR_CHECKPOINT_PATH,
    k=k,
    batch_size=8,
)
optimizer = BFGS(
    neb,
    trajectory=f"test_neb.traj",
)
conv = optimizer.run(fmax=0.45, steps=200)
if conv:
    neb.climb = True
    conv = optimizer.run(fmax=0.05, steps=300)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairchem_applications_cattsunami-0.2.0.tar.gz (927.6 kB view details)

Uploaded Source

Built Distribution

fairchem_applications_cattsunami-0.2.0-py2.py3-none-any.whl (930.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file fairchem_applications_cattsunami-0.2.0.tar.gz.

File metadata

File hashes

Hashes for fairchem_applications_cattsunami-0.2.0.tar.gz
Algorithm Hash digest
SHA256 293531fc19a7d89986aec5466bf4b7fd47a4880fe7ee91bd663a44d15481cf73
MD5 454f1a48d363a2ac371c37d51b47638b
BLAKE2b-256 7c3c4dc5fc74c599ecb493741bccf24565bacb8b80a0f9c1b8785f75b956ecae

See more details on using hashes here.

Provenance

The following attestation bundles were made for fairchem_applications_cattsunami-0.2.0.tar.gz:

Publisher: release.yml on FAIR-Chem/fairchem

Attestations:

File details

Details for the file fairchem_applications_cattsunami-0.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for fairchem_applications_cattsunami-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d1d95b51b9c8ba656a964e9b4d12f4e5857ce4e089715c81ac71d892985915c8
MD5 22fc94697d9ef241e92f23d14c83ae49
BLAKE2b-256 2853c7d943e4b02ef3b3e3353d06906dd25d6c8e815d696836d4186519f25704

See more details on using hashes here.

Provenance

The following attestation bundles were made for fairchem_applications_cattsunami-0.2.0-py2.py3-none-any.whl:

Publisher: release.yml on FAIR-Chem/fairchem

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page