Skip to main content

Chemical shift predictor

Project description

Graph neural network for predicting NMR chemical shifts

Open In Colab

This library is the code and a pre-trained model to predict NMR chemical shifts from protein structures and organic molecules. It relies on the nmrdata package which includes embeddings and NMR parameters.

Install

Install using pip

pip install nmrgnn

Colab

To use this package without installing, use this colab

Command Line Usage

Available commands are

  • nmrgnn eval-struct to predict chemical shifts of structure via MDAnalysis library as coordinate reader
  • nmrgnn train to train a model
  • nmrgnn hyper to tune hyperparameters
  • nmrgnn eval-tfrecords to evaluate model on records in format from nmrdata package

Predict NMR Chemical Shfits

Note: This model is trained on models with no solvent, so remove that before use. For small molecules, the model was trained mostly on water solutions. You should only expect agreement in relative chemical shifts between atoms depending on your solvent and reference.

To predict NMR chemical shifts via the MDAnalysis library as a reader:

nmrgnn eval-struct [struct-file] [output-csv]

where struct-file could be a pdb file or equivalent. Example:

nmrgnn eval-struct 108M.pdb 108M-predicted.csv

For a trajectory, try

nmrgnn eval-struct 108M.pdb 108M.trr 108M-predicted.csv --stride 5

which computes shifts every 5 frames.

Warning about Peaks

If you receive a warning about peaks being poor, you likely have no hydrogens in your protein. You can add using online tools or use these commands to fix quickly by using OpenMM

conda install -y -c omnia openmm
pip install nmrdata[parse]@git+git://github.com/ur-whitelab/nmrgnn.git
nmrparse clean-pdb [your-pdb] [your-pdb]-H.pdb

Library Usage

Available functions are

  • load_model to load the included pre-trained model or specify a path to a trained model
  • universe2graph to convert an MDAnalysis universe into a tuple of atoms, neighbor list, edges, inverse_degree.
  • check_peaks to estimate validity of predicted peaks

The example below predicts peaks and estimates (True/False) if the peaks are valid. Examples of why peaks are not valid are that the elements are not inlcuded in training data (e.g., oxygen shifts) or unusual chemistries or you forgot to remove solvent.

import MDAnalysis as md
import nmrgnn

model = nmrgnn.load_model()
u = md.Universe('108M.pdb')
g = nmrgnn.universe2graph(u)
peaks = model(g)
# check_peaks only uses first element of tuple (atom identities)
confident = nmrgnn.check_peaks(g[0], peaks)

You should not trust peaks coming from model without checking

Analyzing Trajectories

Here is an example for analzying a trajectory

import MDAnalysis as md
import nmrgnn

model = nmrgnn.load_model()

u = md.Universe(PATH_TO_FILES)
for ts in u.trajectory:
    x = nmrgnn.universe2graph(u)
    peaks = model(x)
    nmrgnn.check_peaks(x[0], peaks)    
    # do something with peaks
    ....

Citation

Please cite Predicting Chemical Shifts with Graph Neural Networks

@article{yang2021predicting,
  title={Predicting Chemical Shifts with Graph Neural Networks},
  author={Yang, Ziyue and Chakraborty, Maghesree and White, Andrew D},
  journal={Chemical Science},
  year={2021},
  publisher={Royal Society of Chemistry}
}

Model Performance

Here is the included model performance on proteins (P prefix) and organic molecules (Mol prefix). r is correlation coefficient and rmsd is root mean square deviation. These results vary from paper values because they are evaluated on whole proteins instead of 256 atom fragments.

N baseline
Mol-H-r 307 0.9591749434360993
Mol-H-rmsd 307 0.39710393617916234
P-C-r 6701 0.864163
P-H-r 7747 0.72265
P-N-r 7640 0.890842
P-CA-r 8305 0.97374
P-CB-r 6827 0.990706
P-CD-r 739 0.996123
P-CD1-r 961 0.999515
P-CD2-r 609 0.999223
P-CE-r 340 0.991736
P-CE1-r 261 0.958121
P-CE2-r 173 0.943739
P-CE3-r 37 -0.215088
P-CG-r 1674 0.998763
P-CG1-r 589 0.93124
P-CG2-r 839 0.829016
P-CH2-r 43 0.158363
P-CZ-r 125 0.984575
P-CZ2-r 45 0.311805
P-CZ3-r 37 0.164961
P-HA-r 5565 0.839377
P-HA2-r 462 0.495514
P-HA3-r 449 0.262298
P-HB-r 960 0.958713
P-HB2-r 3427 0.901358
P-HB3-r 3255 0.901234
P-HD1-r 383 0.44733
P-HD11-r 753 0.615756
P-HD12-r 753 0.585852
P-HD13-r 753 0.609181
P-HD2-r 1043 0.988991
P-HD21-r 428 0.617599
P-HD22-r 428 0.651927
P-HD23-r 428 0.605888
P-HD3-r 637 0.95089
P-HE-r 93 0.396258
P-HE1-r 413 0.879142
P-HE2-r 561 0.98963
P-HE3-r 293 0.985685
P-HG-r 389 0.810401
P-HG1-r 11 0.0653286
P-HG11-r 350 0.572609
P-HG12-r 350 0.498696
P-HG13-r 350 0.558426
P-HG2-r 1317 0.867619
P-HG21-r 936 0.689592
P-HG22-r 936 0.674086
P-HG23-r 936 0.662057
P-HG3-r 1200 0.856177
P-HH-r 1 nan
P-HH2-r 51 0.217372
P-HZ-r 134 0.407285
P-HZ2-r 54 0.419415
P-HZ3-r 45 0.318577
P-ND1-r 9 0.184443
P-ND2-r 173 0.320299
P-NE-r 88 0.0135033
P-NE1-r 64 0.0998792
P-NE2-r 149 0.972614
P-NH1-r 3 -0.914066
P-NH2-r 3 -0.276087
P-NZ-r 1 nan
P-C-rmsd 6701 1.22819
P-H-rmsd 7747 0.279766
P-N-rmsd 7640 6.65505
P-CA-rmsd 8305 1.3298
P-CB-rmsd 6827 3.10571
P-CD-rmsd 739 10.3192
P-CD1-rmsd 961 2.74597
P-CD2-rmsd 609 4.35399
P-CE-rmsd 340 1.14623
P-CE1-rmsd 261 4.69154
P-CE2-rmsd 173 4.82229
P-CE3-rmsd 37 3.0327
P-CG-rmsd 1674 1.63828
P-CG1-rmsd 589 1.558
P-CG2-rmsd 839 1.87753
P-CH2-rmsd 43 1.95861
P-CZ-rmsd 125 4.32496
P-CZ2-rmsd 45 1.22984
P-CZ3-rmsd 37 1.99567
P-HA-rmsd 5565 0.0903255
P-HA2-rmsd 462 0.119584
P-HA3-rmsd 449 0.234069
P-HB-rmsd 960 0.103812
P-HB2-rmsd 3427 0.10552
P-HB3-rmsd 3255 0.117287
P-HD1-rmsd 383 0.114696
P-HD11-rmsd 753 0.0699893
P-HD12-rmsd 753 0.0744762
P-HD13-rmsd 753 0.0711484
P-HD2-rmsd 1043 0.105893
P-HD21-rmsd 428 0.0737762
P-HD22-rmsd 428 0.0689306
P-HD23-rmsd 428 0.0764191
P-HD3-rmsd 637 0.0869007
P-HE-rmsd 93 0.422132
P-HE1-rmsd 413 0.376196
P-HE2-rmsd 561 0.0861489
P-HE3-rmsd 293 0.0855213
P-HG-rmsd 389 0.118694
P-HG1-rmsd 11 10.3704
P-HG11-rmsd 350 0.0504736
P-HG12-rmsd 350 0.0552385
P-HG13-rmsd 350 0.0516929
P-HG2-rmsd 1317 0.0654069
P-HG21-rmsd 936 0.0634577
P-HG22-rmsd 936 0.0650697
P-HG23-rmsd 936 0.0679991
P-HG3-rmsd 1200 0.0775636
P-HH-rmsd 1 4.07231
P-HH2-rmsd 51 0.0862706
P-HZ-rmsd 134 0.147387
P-HZ2-rmsd 54 0.13507
P-HZ3-rmsd 45 0.083249
P-ND1-rmsd 9 1576.13
P-ND2-rmsd 173 6.56618
P-NE-rmsd 88 231.589
P-NE1-rmsd 64 4.51713
P-NE2-rmsd 149 13.9975
P-NH1-rmsd 3 5.76985
P-NH2-rmsd 3 0.91028
P-NZ-rmsd 1 165.069

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmrgnn-0.6.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

nmrgnn-0.6-py3-none-any.whl (12.1 MB view details)

Uploaded Python 3

File details

Details for the file nmrgnn-0.6.tar.gz.

File metadata

  • Download URL: nmrgnn-0.6.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.7.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for nmrgnn-0.6.tar.gz
Algorithm Hash digest
SHA256 e4658bc66d5267688f073dd70bc173f308f8f52b3fb493c4c0adee3e01fe74bc
MD5 1e43984dd0d2411e4d8a99625dcc8ecd
BLAKE2b-256 54fae884b6bc2872b539ad9d13b4555d263bb8d0f8ad56b62a276033b4222044

See more details on using hashes here.

File details

Details for the file nmrgnn-0.6-py3-none-any.whl.

File metadata

  • Download URL: nmrgnn-0.6-py3-none-any.whl
  • Upload date:
  • Size: 12.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.7.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for nmrgnn-0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 980f47d5ff39bd9ae33ca531008362ad7a2f626e37416ebde904b4f431108e62
MD5 b861f4a4c338e113b783e7cac847c8dc
BLAKE2b-256 5561ac689b550107cae8f0502a34b8687a6432c161f5270957b8583bc086c204

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page