Code for generating adsorbate-catalyst input configurations

These details have been verified by PyPI

Project links

repository

GitHub Statistics

Maintainers

lbluque

Project description

Open-Catalyst-Dataset

This repository hosts the adsorbate-catalyst input generation workflow used in the Open Catalyst Project.

Install

To install just run in your favorite environment with python >= 3.9

pip install fairchem-data-oc
python src/fairchem/core/scripts/download_large_files.py oc

Workflow

The codebase supports the following workflow to generate adsorbate-catalyst input configurations.

Initialize a bulk:
- By providing an atoms object, or
- By bulk_id (e.g. mp-30), or
- By its index in the database, or
- By selecting randomly.
Initialize an adsorbate:
- By providing an atoms object, or
- By its SMILES string (e.g. *H), or
- By its index in the database, or
- By selecting randomly.
Enumerate slabs from the Bulk class.
This internally uses pymatgen.core.surface.SlabGenerator and supports the following:
- All slabs up to a specified miller index, or
- A random slab among those enumerated by the previous method, or
- A specific miller index.
Place the adsorbate on the slab.
This broadly has two steps -- identifying a binding site on the surface of the slab, and orienting the adsorbate before placing it at that site. We use custom code inspired by pymatgen to do this. There are 3 modes: heuristic, random, and random_site_heuristic_placement.
- Identifying a binding site: First, a Delaunay meshgrid is constructed with surface atoms as nodes. For heuristic, the sites considered are on the node (atop), between 2 nodes (bridge) and in the center of the triangle (hollow). For random and random_site_heuristic_placement, positions of the sites are uniformly randomly sampled along the Delaunay triangles.
- Adsorbate orientation: For heuristic and random_site_heuristic_placement, the adsorbate is uniformly randomly rotated around the z direction, and provided a slight wobble around x and y, which amounts to randomized tilt within a certain cone around the north pole. For random, the adsorbate is uniformly randomly rotated about its center of mass along all directions.
- Binding atom: The adsorbate database includes information about which atoms are expected to bind. For heuristic and random_site_heuristic_placement, the binding atom of the adsorbate is placed at the site, whereas for random the center of mass of the adsorbate is placed at the site.

Workflow image

Usage

Here is a simple example using the ocdata workflow to place CO on Cu (1,1,1):

bulk_src_id = "mp-30"
adsorbate_smiles = "*CO"

bulk = Bulk(bulk_src_id_from_db=bulk_src_id, bulk_db_path="your-path-here.pkl")
adsorbate = Adsorbate(adsorbate_smiles_from_db=adsorbate_smiles, adsorbate_db_path="your-path-here.pkl")
slabs = Slab.from_bulk_get_specific_millers(bulk=bulk, specific_millers=(1,1,1))

# Perform heuristic placements
heuristic_adslabs = AdsorbateSlabConfig(slabs[0], adsorbate, mode="heuristic")

# Perform random site, heuristic placements
random_adslabs = AdsorbateSlabConfig(slabs[0], adsorbate, mode="random_site_heuristic_placement", num_sites=100)

If you want to use a bulk and/or adsorbate that is not in the database here, you may supply your own ase.Atoms object:

bulk = Bulk(bulk_atoms=your_adsorbate_atoms)
adsorbate = Adsorbate(adsorbate_atoms=your_adsorbate_atoms)
slabs = Slab.from_bulk_get_all_slabs(bulk)

# Perform fully random placements
random_adslabs = AdsorbateSlabConfig(slabs[0], adsorbate, mode="random", num_sites=100)

If you would like to randomly choose a bulk, adsorbate, and slab:

bulk = Bulk()
adsorbate = Adsorbate()
slab = Slab.from_bulk_get_random_slab(bulk)

# Perform fully random placements
random_adslabs = AdsorbateSlabConfig(slab, adsorbate, mode="random", num_sites=100)

StructureGenerator API

We also provide a StructureGenerator helper class that wraps the core functionality described above for creating bulk/slab/adsorbate objects, and writing vasp input files and metadata for multiple placements of the adsorbate on the slab. There are a number of options to configure input generation to suit different usecases. We list a few examples here.

Command Line Args

Input files:

--bulk_db (required): path to the bulk database file
--adsorbate_db: path to the adsorbate database file - required if adsorbate placement is to be performed.
--precomputed_slabs_dir: path to the precomputed slab directory, which saves cost/time if the slabs for each bulk have already been enumerated.

Bulk / Slab / Adsorbate specification

Option 1: provide indices. All three must be provided to generate adsorbate-slab configurations, otherwise only slab enumeration will be performed.

--adsorbate_index: index of the desired adsorbate in the database file.
--bulk_index: index of the desired bulk
--surface_index: index of the desired surface

Option 2: provide a set of indices (one of the following)

--indices_file: a file containing strings with the following format f"{adsorbate_idx}_{bulk_idx}_{surface_idx}". This will enumerate slabs as well as adsorbate-slab configurations.
--bulk_indices_file: a file containing bulk indices. This will only do slab enumeration.

Slab enumeration

--max_miller: the max miller index of slabs to be generated (i.e. 1, 2, or 3)

Adsorbate Placement

--seed: random seed for sampling/random sites generation.
--heuristic_placements: to be provided if heuristic placements are desired.
--random_placements: to be provided if random sites are desired. You may do both heuristic and random placements in the same run.
--full_random_rotations: to be provided in addition to --random_placements if fully random placements are desired, as opposed to small wobbles around x/y axis.
--random_sites: the number of sites per slab, which should be provided if --random_placements are used.
--num_augmentations: the number of random adsorbate configurations per site (defaults to 1).

Multiprocessing, when given a file of indices

--chunks: for multi-node processing, number of chunks to split inputs across.
--chunk_index: for multi-node processing, index of chunk to process.
--workers: number of workers for multiprocessing within one job

Outputs

--output_dir: directory to save outputs
--no_vasp: for VASP input files, only write POSCAR and do not write INCAR, KPOINTS, or POTCAR
--verbose: if detailed info should be logged

Usage

python structure_generator.py \
  --bulk_db databases/pkls/bulks.pkl \
  --adsorbate_db databases/pkls/adsorbates.pkl  \
  --output_dir outputs/ \
  --adsorbate_index 0 \
  --bulk_index 0 \
  --surface_index 0 \
  --heuristic_placements

python structure_generator.py \
  --bulk_db databases/pkls/bulks.pkl \
  --adsorbate_db databases/pkls/adsorbates.pkl  \
  --indices_file your_index_file.txt \
  --seed 0 \
  --random_placements \
  --random_sites 100

Databases for bulks and adsorbates

Bulks

A database of bulk materials taken from existing databases (i.e. Materials Project) and relaxed with consistent RPBE settings may be found in databases/pkls/bulks.pkl (if not, run the command python src/fairchem/core/scripts/download_large_files.py oc from the root of the fairchem repo). To preview what bulks are available, view the corresponding mapping between indices and bulks (bulk id and composition): https://dl.fbaipublicfiles.com/opencatalystproject/data/input_generation/mapping_bulks_2021sep20.txt

Adsorbates

A database of adsorbates may be found in ocdata/databases/pkls/adsorbates.pkl. Alternatively, it may be downloaded using the following link: The latest version is https://dl.fbaipublicfiles.com/opencatalystproject/data/input_generation/adsorbate_db_2021apr28.pkl (MD5 checksum: 975e00a62c7b634b245102e42167b3fb). To preview what adsorbates are available, view the corresponding mapping between indices and adsorbates (SMILES): https://dl.fbaipublicfiles.com/opencatalystproject/data/input_generation/mapping_adsorbates_2020may12.txt

Previous snapshots of the codebase

OC20 was generated with an older version of the bulks and this repository. If you would like to exactly reproduce that work, see README_legacy_OC20.md.
OC22 was generated from the OC22_dataset branch of this repository.

License

ocdata is released under the MIT license.

Citation

If you use this codebase in your work, please consider citing:

@article{ocp_dataset,
    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},
}

The Open Catalyst 2020 (OC20) and Open Catalyst 2022 (OC22) datasets are licensed under a Creative Commons Attribution 4.0 License.

Project details

These details have been verified by PyPI

Project links

repository

GitHub Statistics

Maintainers

lbluque

Release history Release notifications | RSS feed

This version

1.0.2

Aug 22, 2025

1.0.1

Jun 4, 2025

0.2.0

Dec 3, 2024

0.1.0

Sep 13, 2024

0.0.1

May 17, 2024

0.0.1b0 pre-release

May 15, 2024

0.0.0b0 pre-release

May 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairchem_data_oc-1.0.2.tar.gz (177.8 kB view details)

Uploaded Aug 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fairchem_data_oc-1.0.2-py2.py3-none-any.whl (187.3 kB view details)

Uploaded Aug 22, 2025 Python 2Python 3

File details

Details for the file fairchem_data_oc-1.0.2.tar.gz.

File metadata

Download URL: fairchem_data_oc-1.0.2.tar.gz
Upload date: Aug 22, 2025
Size: 177.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fairchem_data_oc-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`c5184241f94f5d0714a2bc79cc255d44bb85317f8be21d7a30e972711e525558`
MD5	`8e455cd8a471f1e963f526cff0def96d`
BLAKE2b-256	`6fe43bb30e0e2d71296e4a37afa721ee802a3260b1e75eb626474cbdf5dce1db`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fairchem_data_oc-1.0.2.tar.gz:

Publisher: release.yml on facebookresearch/fairchem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fairchem_data_oc-1.0.2.tar.gz
- Subject digest: c5184241f94f5d0714a2bc79cc255d44bb85317f8be21d7a30e972711e525558
- Sigstore transparency entry: 423266179
- Sigstore integration time: Aug 22, 2025
Source repository:
- Permalink: facebookresearch/fairchem@0f1c9d265ab9ede7dddd81fd54922a11c5b99fa4
- Branch / Tag: refs/tags/fairchem_data_oc-1.0.2
- Owner: https://github.com/facebookresearch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0f1c9d265ab9ede7dddd81fd54922a11c5b99fa4
- Trigger Event: release

File details

Details for the file fairchem_data_oc-1.0.2-py2.py3-none-any.whl.

File metadata

Download URL: fairchem_data_oc-1.0.2-py2.py3-none-any.whl
Upload date: Aug 22, 2025
Size: 187.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fairchem_data_oc-1.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`04e1ce28eae1c71eecfd1f5131929ad19f9bf3e10ac1832ffcc67ef85ab78187`
MD5	`72977effcc09a1ed3077b366c8b2ad2d`
BLAKE2b-256	`cae0237f9777e6de5a9a2d97c1870225207fdaaac6826df5e4739a86bc945406`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fairchem_data_oc-1.0.2-py2.py3-none-any.whl:

Publisher: release.yml on facebookresearch/fairchem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fairchem_data_oc-1.0.2-py2.py3-none-any.whl
- Subject digest: 04e1ce28eae1c71eecfd1f5131929ad19f9bf3e10ac1832ffcc67ef85ab78187
- Sigstore transparency entry: 423266194
- Sigstore integration time: Aug 22, 2025
Source repository:
- Permalink: facebookresearch/fairchem@0f1c9d265ab9ede7dddd81fd54922a11c5b99fa4
- Branch / Tag: refs/tags/fairchem_data_oc-1.0.2
- Owner: https://github.com/facebookresearch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0f1c9d265ab9ede7dddd81fd54922a11c5b99fa4
- Trigger Event: release

fairchem-data-oc 1.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Project description

Open-Catalyst-Dataset

Install

Workflow

Usage

StructureGenerator API

Command Line Args

Input files:

Bulk / Slab / Adsorbate specification

Slab enumeration

Adsorbate Placement

Multiprocessing, when given a file of indices

Outputs

Usage

Databases for bulks and adsorbates

Bulks

Adsorbates

Previous snapshots of the codebase

License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance