Skip to main content

Chemical shift prediction dataset

Project description

Data for NMR GNN

This contains the parsing scripts and data used for our GNN chemical shift predictor model.

Install

pip install nmrgnn-data

Working in Python

Here's an example of how to load and work with data in python. The records are loaded as a tensorflow dataset (read more here), but can be used in a for loop as shown below.

import nmrdata
dataset = nmrdata.load_records('data/metabolite-records.tfrecord')
for record in dataset:
    # get single record
    break
print(record.keys())

output:

dict_keys(['natoms', 'nneigh', 'features', 'nlist', 'positions', 'peaks', 'mask', 'name', 'class', 'index'])

Access positions as a numpy array

record['positions'].numpy()

output:

array([[ 0.83740795,  0.09760247,  0.2959486 ],
       [-0.562893  ,  0.00262405, -0.00434441],
       [-1.0725924 , -0.37873718,  0.9061929 ],
       [-0.75536764, -0.72710234, -0.8159687 ],
       [-1.0367495 ,  0.9557108 , -0.27988592],
       [ 1.2855262 , -0.8334997 ,  0.10487328],
       [ 1.3046683 ,  0.8834019 , -0.20681578]], dtype=float32)

Get chemical shifts

record['peaks'].numpy()
array([0.  , 0.  , 2.59, 2.59, 2.59, 0.  , 0.  ], dtype=float32)

Numpy Error

If you see this error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Try re-install numpy

pip uninstall -y numpy && pip install numpy

Parsing Scripts

To install with the parsing functionality, use this

conda install -c omnia openmm
pip install nmrgnn-data[parse]

Working with Data

All commands below can have additional information printed using the --help argument.

Find pairs

Find pairs of atoms with chemical shifts that are neighbors and sort them based on distance.

nmrdata find-pairs structure-test.tfrecords-data.tfrecord ALA-H ALA-N

Count Names

Get class/atom name counts:

nmrdata count-names structure-test.tfrecords-data.tfrecord

Validate

Check that records are consistent with embeddings

nmrdata validate-embeddings structure-test.tfrecords-data.tfrecord

Check that neighbor lists are consistent with embeddings

nmrdata validate-nlist structure-test.tfrecords-data.tfrecord

Check that peaks are reasonable (no nans, no extreme values, no bad masks)

nmrdata validate-peaks structure-test.tfrecords-data.tfrecord

Output Lables

To extract labels ordered by PDB and residue:

nmrdata write-peak-labels test-structure-shift-data.tfrecord  test-structure-shift-record-info.txt labels.txt

Making New Data

See commands nmrparse shiftml, nmrparse metabolites, nmrparse shiftx which are parsers for various databases.

From RefDB Files

This requires a pickled python object called data.pb to be in the directory. It is a list of dicts containing pdb_file (path to PDB), pdb (PDB ID), corr (path to .corr file), and chain (which chain). chain can be _ to indicate use first chain.

nmrparse parse-refdb directory name --pdb_filter exclude_ids.txt

Citation

Please cite Predicting Chemical Shifts with Graph Neural Networks

@article{yang2021predicting,
  title={Predicting chemical shifts with graph neural networks},
  author={Yang, Ziyue and Chakraborty, Maghesree and White, Andrew D},
  journal={Chemical science},
  volume={12},
  number={32},
  pages={10802--10809},
  year={2021},
  publisher={Royal Society of Chemistry}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmrgnn-data-1.1.0.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

nmrgnn_data-1.1.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file nmrgnn-data-1.1.0.tar.gz.

File metadata

  • Download URL: nmrgnn-data-1.1.0.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for nmrgnn-data-1.1.0.tar.gz
Algorithm Hash digest
SHA256 682aefe62a985d7d28a9277391e4e56ffe6703b18699c5a083956b17c80bd601
MD5 3d8e5f9361db093f7e06b2cfdd7c21c2
BLAKE2b-256 64af167a60bd49edea01ca496593223368dd950407ecde488ccfe33e46fc6f25

See more details on using hashes here.

File details

Details for the file nmrgnn_data-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: nmrgnn_data-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for nmrgnn_data-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d52a49dba24d54cc4479fc8ae431ce4ad41087eb28baddbb63adee0778281f2
MD5 35f9fa1ffc5c10193c2e7ee9274a55b6
BLAKE2b-256 f3df08d4feff1a9c22212ffb53c3f063ea19b5b05fb450d360300b0c13d2e94e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page