No project description provided
Project description
Graph Attentional Protein Parametrization (GrAPPa)
A machine-learned molecular mechanics force field using a deep graph attentional network
(code supporting https://arxiv.org/abs/2404.00050)
Abstract
Simulating large molecular systems over long timescales requires force fields that are both accurate and efficient. In recent years, E(3) equivariant neural networks have lifted the tension between computational efficiency and accuracy of force fields, but they are still several orders of magnitude more expensive than classical molecular mechanics (MM) force fields.
Here, we propose a novel machine learning architecture to predict MM parameters from the molecular graph, employing a graph attentional neural network and a transformer with symmetry-preserving positional encoding. The resulting force field, Grappa, outperforms established and other machine-learned MM force fields in terms of accuracy at the same computational efficiency and can be used in existing Molecular Dynamics (MD) engines like GROMACS and OpenMM. It predicts energies and forces of small molecules, peptides, RNA and - showcasing its extensibility to uncharted regions of chemical space - radicals at state-of-the-art MM accuracy. We demonstrate Grappa's transferability to macromolecules in MD simulations, during which large protein are kept stable and small proteins can fold. Our force field sets the stage for biomolecular simulations close to chemical accuracy, but with the same computational cost as established protein force fields.
Grappa Overview
Grappa first predicts node embeddings from the molecular graph. In a second step, it predicts MM parameters for each n-body interaction from the embeddings of the contributing nodes, respecting the necessary permutation symmetry.
Table of contents
Usage
The current version of Grappa only predicts bonded parameters; the nonbonded parameters like partial charges and Lennard Jones parameters are predicted with a traditional force field of choice. The input to Grappa is therefore a representation of the system of interest that already contains information on the nonbonded parameters. Currently, Grappa is compatible with GROMACS and OpenMM.
For complete example scripts, see examples/usage
.
GROMACS
In GROMACS, Grappa can be used as command line application that receives the path to a topology file and writes the bonded parameters in a new topology file.
# parametrize the system with a traditional forcefield:
gmx pdb2gmx -f your_protein.pdb -o your_protein.gro -p topology.top -ignh
# create a new topology file with the bonded parameters from Grappa, specifying the tag of the grappa model:
grappa_gmx -f topology.top -o topology_grappa.top -t grappa-1.3 -p
# (you can create a plot of the parameters for inspection using the -p flag)
# continue with ususal gromacs workflow (solvation etc.)
OpenMM
To use Grappa in OpenMM, parametrize your system with a traditional forcefield, from which the nonbonded parameters are taken, and then pass it to Grappas OpenMM wrapper class:
from openmm.app import ForceField, Topology
from grappa import OpenmmGrappa
topology = ... # load your system as openmm.Topology
classical_ff = ForceField('amber99sbildn.xml', 'tip3p.xml')
system = classical_ff.createSystem(topology)
# load the pretrained ML model from a tag. Currently, possible tags are 'grappa-1.3' and 'latest'
grappa_ff = OpenmmGrappa.from_tag('grappa-1.3')
# parametrize the system using grappa.
system = grappa_ff.parametrize_system(system, topology, charge_model='amber99')
There is also the option to obtain an openmm.app.ForceField that calls Grappa for bonded parameter prediction behind the scenes:
from openmm.app import ForceField, Topology
from grappa import as_openmm
topology = ... # load your system as openmm.Topology
grappa_ff = as_openmm('grappa-1.3', base_forcefield=['amber99sbildn.xml', 'tip3p.xml'])
assert isinstance(grappa_ff, ForceField)
system = grappa_ff.createSystem(topology)
Installation
For using Grappa in GROMACS or OPENMM, Grappa in cpu mode is sufficient since the inference runtime of Grappa is usually small compared to the simulation runtime.
CPU mode
To install Grappa in cpu mode, simply clone the repository and install requirements and the package itself with pip:
git clone https://github.com/hits-mbm-dev/grappa.git
cd grappa
conda create -n grappa python=3.10 -y
conda activate grappa
pip install -r installation/cpu_requirements.txt
pip install -e .
Verify the installation by running
python tests/test_installation.py
GROMACS
The creation of custom GROMACS topology files is handled by Kimmdy, which can be installed in the same environment as grappa via pip,
pip install kimmdy==6.8.3
OpenMM
Unfortunately, OpenMM is not available on pip and has to be installed via conda in the same environment as grappa,
conda install -c conda-forge openmm
Since the resolution of package dependencies can be slow in conda, it is recommended to install OpenMM first and then install Grappa.
GPU mode
For training grappa models, neither OpenMM nor Kimmdy ar needed, only an environment with a working installation of PyTorch and DGL for the cuda version of choice. Instructions for installing dgl with cuda can be found at installation/README.md
.
In this environment, Grappa can be installed by
pip install -r installation/requirements.txt
pip install -e .
Verify the installation by running
python tests/test_installation.py
Pretrained Models
Pretrained models can be obtained by using grappa.utils.run_utils.model_from_tag
with a tag (e.g. latest
) that will point to a version-dependent url, from which model weights are downloaded.
Available models are listed in models/published_models.csv
.
An example can be found at examples/usage/openmm_wrapper.py
, available tags are listed in models/published_models.csv
.
For full reproducibility, also the respective partition of the dataset and the configuration file used for training is included in the released checkpoints and can be found at models/tag/config.yaml
and models/tag/split.json
after downloading the respective model (see examples/reproducibility
). In the case of grappa-1.3
, this is equivalent to running
python experiments/train.py data=grappa-1.3 model=default experiment=default
Datasets
Datasets of dgl graphs representing molecules can be obtained by using the grappa.data.Dataset.from_tag
constructor.
An example can be found at examples/usage/dataset.py
, available tags are listed in data/published_datasets.csv
.
To re-create the benchmark experiment, also the splitting into train/val/test sets from Espaloma is needed. This can be done by running dataset_creation/get_espaloma_split/save_split.py
, which will create a file espaloma_split.json
that contains lists of smilestrings for each of the sub-datasets. These are used to classify molecules as being train/val/test molecules upon loading the dataset in the train scripts from experiments/benchmark
.
The datasets 'dipeptides-300K-...', 'dipeptides-1000K-...', 'uncapped_...', 'hyp-dop_...' and 'dipeptides_radical-300K' were generated using scripts at grappa-data-creation.
For the creation of custom datasets, take a look at the tutorials examples/dataset_creation/create_dataset.py
and examples/dataset_creation/uncommon_molecule_dataset.py
.
Training
Grappa models can be trained with a given configuration specified using hydra by running
python experiments/train.py
With hydra, configuration files can be defined in a modular way. For Grappa, we have configuration types model
, data
and experiment
, for each of which default values can be overwritten in the command line or in a separate configuration file. For example, to train a model with less node features, one can run
python experiments/train.py model.graph_node_features=32
and for training on the datasets of grappa-1.3 (defined in configs/data/grappa-1.3
), one can run
python experiments/train.py data=grappa-1.3 model=default experiment=default
For starting training with pretrained model weights, call e.g.
python experiments/train.py experiment.ckpt_path=models/grappa-1.3.0/checkpoint.ckpt
Training is logged in wandb and can be safely interrupted by pressing ctrl+c
at any time. Checkpoints with the best validation loss will be saved in the ckpt/<project>/<name>/<data>
directory.
For evaluation, run
python experiments/evaluate.py evaluate.ckpt_path=<path_to_checkpoint>
or, for comparing with given classical force fields whose predictions are stored in the dataset, create configs/evaluate/your_config.yaml
and run
python experiments/evaluate.py evaluate=your_config
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file grappa_ff-1.3.1.tar.gz
.
File metadata
- Download URL: grappa_ff-1.3.1.tar.gz
- Upload date:
- Size: 616.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb83df270fafc38996857efbb2310a6558f5d23c58dd94cefefd3235779e987f |
|
MD5 | 3b4623e98ad924fb8f98f8070d4cfd2f |
|
BLAKE2b-256 | 8efd3b13c97e66ddd26b268f12bb4a8b7fd9fec9dad2b2af8d07b60554053170 |
File details
Details for the file grappa_ff-1.3.1-py3-none-any.whl
.
File metadata
- Download URL: grappa_ff-1.3.1-py3-none-any.whl
- Upload date:
- Size: 618.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d46b255a43a3205cffb9798faff4b26ad36249616e322c876bd6779a36e99d07 |
|
MD5 | 531154495d2fbfcce13ac44d2e0d175d |
|
BLAKE2b-256 | 40054fae56c00182e68893c4fa1641826a912d5a898bc89ac1b1ba2466819ab1 |