Chemical shift prediction dataset
Project description
Data for NMR GNN
This contains the parsing scripts and data used for our GNN chemical shift predictor model.
Install
pip install nmrgnn-data
Working in Python
Here's an example of how to load and work with data in python. The records are loaded as a tensorflow dataset (read more here), but can be used in a for loop as shown below.
import nmrdata
dataset = nmrdata.load_records('data/metabolite-records.tfrecord')
for record in dataset:
# get single record
break
print(record.keys())
output:
dict_keys(['natoms', 'nneigh', 'features', 'nlist', 'positions', 'peaks', 'mask', 'name', 'class', 'index'])
Access positions as a numpy array
record['positions'].numpy()
output:
array([[ 0.83740795, 0.09760247, 0.2959486 ],
[-0.562893 , 0.00262405, -0.00434441],
[-1.0725924 , -0.37873718, 0.9061929 ],
[-0.75536764, -0.72710234, -0.8159687 ],
[-1.0367495 , 0.9557108 , -0.27988592],
[ 1.2855262 , -0.8334997 , 0.10487328],
[ 1.3046683 , 0.8834019 , -0.20681578]], dtype=float32)
Get chemical shifts
record['peaks'].numpy()
array([0. , 0. , 2.59, 2.59, 2.59, 0. , 0. ], dtype=float32)
Numpy Error
If you see this error:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
Try re-install numpy
pip uninstall -y numpy && pip install numpy
Parsing Scripts
To install with the parsing functionality, use this
conda install -c omnia openmm
pip install nmrgnn-data[parse]
Working with Data
All commands below can have additional information printed using the --help
argument.
Find pairs
Find pairs of atoms with chemical shifts that are neighbors and sort them based on distance.
nmrdata find-pairs structure-test.tfrecords-data.tfrecord ALA-H ALA-N
Count Names
Get class/atom name counts:
nmrdata count-names structure-test.tfrecords-data.tfrecord
Validate
Check that records are consistent with embeddings
nmrdata validate-embeddings structure-test.tfrecords-data.tfrecord
Check that neighbor lists are consistent with embeddings
nmrdata validate-nlist structure-test.tfrecords-data.tfrecord
Check that peaks are reasonable (no nans, no extreme values, no bad masks)
nmrdata validate-peaks structure-test.tfrecords-data.tfrecord
Output Lables
To extract labels ordered by PDB and residue:
nmrdata write-peak-labels test-structure-shift-data.tfrecord test-structure-shift-record-info.txt labels.txt
Making New Data
See commands nmrparse shiftml
, nmrparse metabolites
, nmrparse shiftx
which are parsers for various databases.
From RefDB Files
This requires a pickled python object called data.pb
to be in the directory. It is
a list of dict
s containing pdb_file
(path to PDB), pdb
(PDB ID), corr
(path to .corr
file), and chain
(which chain).
chain
can be _
to indicate use first chain.
nmrparse parse-refdb directory name --pdb_filter exclude_ids.txt
Citation
Please cite Predicting Chemical Shifts with Graph Neural Networks
@article{yang2021predicting,
title={Predicting chemical shifts with graph neural networks},
author={Yang, Ziyue and Chakraborty, Maghesree and White, Andrew D},
journal={Chemical science},
volume={12},
number={32},
pages={10802--10809},
year={2021},
publisher={Royal Society of Chemistry}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nmrgnn-data-1.1.0.tar.gz
.
File metadata
- Download URL: nmrgnn-data-1.1.0.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 682aefe62a985d7d28a9277391e4e56ffe6703b18699c5a083956b17c80bd601 |
|
MD5 | 3d8e5f9361db093f7e06b2cfdd7c21c2 |
|
BLAKE2b-256 | 64af167a60bd49edea01ca496593223368dd950407ecde488ccfe33e46fc6f25 |
File details
Details for the file nmrgnn_data-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: nmrgnn_data-1.1.0-py3-none-any.whl
- Upload date:
- Size: 32.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d52a49dba24d54cc4479fc8ae431ce4ad41087eb28baddbb63adee0778281f2 |
|
MD5 | 35f9fa1ffc5c10193c2e7ee9274a55b6 |
|
BLAKE2b-256 | f3df08d4feff1a9c22212ffb53c3f063ea19b5b05fb450d360300b0c13d2e94e |