Machine Learned Molecular Mechanics Force Field

Project description

Grappa - Machine Learned MM Parameterization

A machine learned molecular mechanics force field using a deep graph attentional network
(code supporting https://arxiv.org/abs/2404.00050)

Table of contents

Abstract
Usage
Installation
Pretrained Models
Datasets
Training
Common Pitfalls

Abstract

Simulating large molecular systems over long timescales requires force fields that are both accurate and efficient. In recent years, E(3) equivariant neural networks have lifted the tension between computational efficiency and accuracy of force fields, but they are still several orders of magnitude more expensive than established molecular mechanics (MM) force fields. Here, we propose Grappa, a machine learning framework to predict MM parameters from the molecular graph, employing a graph attentional neural network and a transformer with symmetry-preserving positional encoding. The resulting Grappa force field outperformstabulated and machine-learned MM force fields in terms of accuracy at the same computational efficiency and can be used in existing Molecular Dynamics (MD) engines like GROMACS and OpenMM. It predicts energies and forces of small molecules, peptides, RNA and - showcasing its extensibility to uncharted regions of chemical space - radicals at state-of-the-art MM accuracy. We demonstrate Grappa's transferability to macromolecules in MD simulations from a small fast folding protein up to a whole virus particle. Our force field sets the stage for biomolecular simulations closer to chemical accuracy, but with the same computational cost as established protein force fields.

Grappa Overview

Grappa predicts MM parameters in two steps. First, atom embeddings are predicted from the molecular graph with a graph neural network. Then, transformers with symmetric positional encoding followed by permutation invariant pooling maps the embeddings to MM parameters with desired permutation symmetries. Once the MM parameters are predicted, the potential energy surface can be evaluated with MM-efficiency for different spatial conformations, e.g. in GROMACS or OpenMM.

Usage

The current version of Grappa only predicts bonded parameters; the nonbonded parameters like partial charges and Lennard Jones parameters are predicted with a traditional force field of choice. The input to Grappa is therefore a representation of the system of interest that already contains information on the nonbonded parameters. Currently, Grappa is compatible with GROMACS and OpenMM.

For complete example scripts, see examples/usage.

GROMACS

In GROMACS, Grappa can be used as command line application that receives the path to a topology file and writes the bonded parameters in a new topology file.

# parametrize the system with a traditional forcefield:
gmx pdb2gmx -f your_protein.pdb -o your_protein.gro -p topology.top -ignh

# create a new topology file with the bonded parameters from Grappa, specifying the tag of the grappa model:
grappa_gmx -f topology.top -o topology_grappa.top -t grappa-1.4 -p

# (you can create a plot of the parameters for inspection using the -p flag)

# continue with ususal gromacs workflow (solvation etc.)

OpenMM

To use Grappa in OpenMM, parametrize your system with a traditional forcefield, from which the nonbonded parameters are taken, and then pass it to Grappas OpenMM wrapper class:

from openmm.app import ForceField, Topology
from grappa import OpenmmGrappa

topology = ... # load your system as openmm.Topology

classical_ff = ForceField('amber99sbildn.xml', 'tip3p.xml')
system = classical_ff.createSystem(topology)

# load the pretrained ML model from a tag. Currently, possible tags are 'grappa-1.4', 'grappa-1.3' and 'latest'
grappa_ff = OpenmmGrappa.from_tag('grappa-1.4')

# parametrize the system using grappa.
system = grappa_ff.parametrize_system(system, topology)

There is also the option to obtain an openmm.app.ForceField that calls Grappa for bonded parameter prediction behind the scenes:

from openmm.app import ForceField, Topology
from grappa import as_openmm

topology = ... # load your system as openmm.Topology

grappa_ff = as_openmm('grappa-1.4', base_forcefield=['amber99sbildn.xml', 'tip3p.xml'])
assert isinstance(grappa_ff, ForceField)

system = grappa_ff.createSystem(topology)

Installation

For using Grappa in GROMACS or OPENMM, Grappa in cpu mode is sufficient since the inference runtime of Grappa is usually small compared to the simulation runtime. For training, gpu mode is advised, see below.

CPU mode

Create a conda environment with python 3.10:

conda create -n grappa python=3.10 -y
conda activate grappa

In cpu mode, Grappa is available on PyPi:

pip install grappa-ff

Depending on the platform used, installation of OpenMM or GROMACS and Kimmdy is needed (see below).

Installation from source (CPU mode)

To install Grappa from source, clone the repository and install requirements and the package itself with pip:

git clone https://github.com/hits-mbm-dev/grappa.git
cd grappa

pip install -r installation_utils/cpu_requirements.txt
pip install -e .

Verify the installation by running

pytest

GROMACS

The creation of custom GROMACS topology files is handled by Kimmdy, which can be installed in the same environment as Grappa via pip,

pip install kimmdy==6.8.3

If Grappa was installed from source, verify the Grappa-gmx installation by running

pytest
pytest -m slow

OpenMM

OpenMM is not available on pip and has to be installed via conda in the same environment as Grappa,

conda install -c conda-forge openmm # optional: cudatoolkit=<YOUR CUDA>

Since the resolution of package dependencies can be slow in conda, it is recommended to install OpenMM first and then install Grappa.

If Grappa was installed from source, Grappa-OpenMM installation by running

pytest
pytest -m slow

Installation in GPU mode

For training Grappa models, neither OpenMM nor Kimmdy ar needed, only an environment with a working installation of PyTorch and DGL for the cuda version of choice. Note that installing Grappa in GPU mode is only recommended if training a model is intended. Instructions for installing dgl with cuda can be found at installation_utils/README.md. In this environment, Grappa can be installed by

pip install -r installation_utils/requirements.txt
pip install -e .

Verify the installation by running

pytest
pytest -m gpu

Pretrained models

Pretrained models can be obtained by using grappa.utils.run_utils.model_from_tag with a tag (e.g. latest) that will point to a version-dependent url, from which model weights are downloaded. Available models are listed in models/published_models.csv. An example can be found at examples/usage/openmm_wrapper.py, available tags are listed in models/published_models.csv.

For full reproducibility, also the respective partition of the dataset and the configuration file used for training is included in the released checkpoints and can be found at models/tag/config.yaml and models/tag/split.json after downloading the respective model (see examples/reproducibility). In the case of grappa-1.4, this is equivalent to running

python experiments/train.py data=grappa-1.4 model=default experiment=default

Tag	Description
grappa-1.4.0	Covers peptides, small molecules, rna. Used for protein and peptide simulations reported in the paper.
grappa-1.4.1-radical	Covers peptides, small molecules, rna, peptide radicals.
grappa-1.4.1-light	Lightweight model with much fewer parameters for testing. Covers peptides, small molecules, rna, peptide radicals.

Datasets

Datasets of dgl graphs representing molecules can be obtained by using the grappa.data.Dataset.from_tag constructor. An example can be found at examples/usage/dataset.py, available tags are listed in data/published_datasets.csv.

To re-create the benchmark experiment, also the splitting into train/val/test sets from Espaloma is needed. This can be done by running dataset_creation/get_espaloma_split/save_split.py, which will create a file espaloma_split.json that contains lists of smilestrings for each of the sub-datasets. These are used to classify molecules as being train/val/test molecules upon loading the dataset in the train scripts from experiments/benchmark.

For the creation of custom datasets, take a look at the tutorials examples/dataset_creation/create_dataset.py and examples/dataset_creation/uncommon_molecule_dataset.py.

Tag	Description
spice-pubchem	Small molecule dataset from Espaloma. Sampled from MD.
rna-nucleoside	Nucleoside dataset from Espaloma. Sampled from MD.
gen2	Small molecule dataset from Espaloma. Sampled from optimization trajectories.
spice-des-monomers	Small molecule dataset from Espaloma. Sampled from MD.
spice-dipeptide	Dipeptide dataset from Espaloma. Sampled from MD.
rna-diverse	RNA dataset from Espaloma. Sampled from MD.
gen2-torsion	Small molecule dataset from Espaloma. Sampled from torsion scans.
pepconf-dlc	Peptide dataset from Espaloma. Sampled from optimization trajectories.
protein-torsion	Peptide dataset from Espaloma. Sampled from torsion scans.
rna-trinucleotide	Trinucleotide dataset from Espaloma. Sampled from MD.
espaloma_split	Defines the train val test split used for training Espaloma 0.3.0.
spice-pubchem-filtered	spice-pubchem without molecules with QM forces over 500 kcal/mol/Angstroem.
spice-dipeptide-amber99	Spice-dipeptide but with nonbonded parameters from amber99.
spice-dipeptide-charmm36	Spice-dipeptide but with nonbonded parameters from charmm36.
protein-torsion-amber99	Protein-torsion but with nonbonded parameters from amber99.
protein-torsion-charmm36	Protein-torsion but with nonbonded parameters from charmm36.
dipeptides-hyp-dop-300K-amber99	Dataset of dipeptides with HYP and DOP residues at 300K with amber99SB-ILDN* nonbonded parameters. Sampled from MD.
uncapped-300K-openff-1.2.0	Dataset of peptides without capping at 300K with OpenFF 1.2.0/am1-bcc nonbonded parameters. Sampled from MD.
peptide-radical-MD	Radical peptides with states sampled from MD.
peptide-radical-scan	Radical peptides with states sampled from torsion scans.
peptide-radical-opt	Radical peptides with states sampled from optimization trajectories.

Espaloma datasets from: https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc00690a

Training

Grappa models can be trained with a given configuration specified using hydra by running

python experiments/train.py data.data_module.datasets=[spice-dipeptide]

With hydra, configuration files can be defined in a modular way. For Grappa, we have configuration types model, data and experiment, for each of which default values can be overwritten in the command line or in a separate configuration file. For example, to train a model with less node features, one can run

python experiments/train.py model.graph_node_features=32

and for training on the datasets of grappa-1.4 (defined in configs/data/grappa-1.4.0), one can run

python experiments/train.py data=grappa-1.4 model=default experiment=default

For starting training with pretrained model weights, call e.g.

python experiments/train.py experiment.ckpt_path=models/grappa-1.3.0/checkpoint.ckpt

Training is logged in wandb and can be safely interrupted by pressing ctrl+c at any time. Checkpoints with the best validation loss will be saved in the ckpt/<project>/<name>/<data> directory.

For evaluation, run

python experiments/evaluate.py evaluate.ckpt_path=<path_to_checkpoint>

or, for comparing with given classical force fields whose predictions are stored in the dataset, create configs/evaluate/your_config.yaml and run

python experiments/evaluate.py evaluate=your_config

A checkpoint can also be downloaded from a tag. By default, the dataset config of the checkpoint is used for evaluation, but one can override the respective config args to evaluate solely on custom datasets:

python experiments/evaluate.py evaluate.ckpt_path=grappa-1.4.0 evaluate.datasets=[] evaluate.pure_test_datasets=[<your_dataset_tag>]

Using own trained models

To use a locally trained model, the lightning module checkpoint can be used to load the model for initializing the Grappa class. For example, in openmm:

from grappa import OpenmmGrappa
grappa_ff = OpenmmGrappa.from_ckpt('path/to/your/checkpoint.ckpt')

You can also simply put the checkpoint and a config.yaml file in the repository grappa/models/<your_model_tag> and use <your_model_tag> as a tag for loading the model.

Common pitfalls

Deployment

D.1 CUDA errors

Install Grappa in CPU mode for using it as OpenMM or GROMACS force field, a gpu is not necessary for inference but only for training. If you intend to train and deploy Grappa, it is easiest to have two separate environments, one for training with Grappa in GPU mode without OpenMM or KIMMDY installed and one for dataset curation and deployment with Grappa in CPU mode.

Training

T.1 Delete cached datasets upon changes

Grappa caches datasets in a compressed form at data/dgl_datasets/<dataset-name>. If you change the .npz files that define the dataset with more details (at data/datasets/<dataset-name>/*.npz), make sure to delete the respective cache.

Project details

Release history Release notifications | RSS feed

This version

1.4.0

Feb 5, 2025

1.3.1

Jul 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grappa_ff-1.4.0.tar.gz (2.5 MB view details)

Uploaded Feb 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grappa_ff-1.4.0-py3-none-any.whl (2.5 MB view details)

Uploaded Feb 5, 2025 Python 3

File details

Details for the file grappa_ff-1.4.0.tar.gz.

File metadata

Download URL: grappa_ff-1.4.0.tar.gz
Upload date: Feb 5, 2025
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for grappa_ff-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`fc6de52fb2c0779401e9ab6727a31483ae8ff2861d2c6fa30f861fdd7f7b51bc`
MD5	`ad760d6498fb398fb6f987c88f1d33f0`
BLAKE2b-256	`d97a34e6e45c7de21265208d8f62fff41a70725384be3100fe1ad55da19da7ca`

See more details on using hashes here.

File details

Details for the file grappa_ff-1.4.0-py3-none-any.whl.

File metadata

Download URL: grappa_ff-1.4.0-py3-none-any.whl
Upload date: Feb 5, 2025
Size: 2.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for grappa_ff-1.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c61aa72b865757d8ecfbf2255ac30dc25a817b882713b224bc357d2a3c92bf4`
MD5	`3c1c2c0795bee280b43e843e82c2b7c1`
BLAKE2b-256	`cb723405eb3555baad9feb3e4b00f064bf67ed331dfa358ab6948227649477d0`

See more details on using hashes here.

grappa-ff 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Grappa - Machine Learned MM Parameterization

Abstract

Usage

GROMACS

OpenMM

Installation

CPU mode

Installation from source (CPU mode)

GROMACS

OpenMM

Installation in GPU mode

Pretrained models

Datasets

Training

Using own trained models

Common pitfalls

Deployment

D.1 CUDA errors

Training

T.1 Delete cached datasets upon changes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes