DISTEVAL: For inter-residue protein distance evaluation
Project description
DISTEVAL: Protein distance evaluation
Project abstract
Background: Protein inter-residue contact and distance prediction are two key intermediate steps essential to accurate protein structure prediction. Distance prediction comes in two forms: real-valued distances and 'binned' distograms, which are a more finely grained variant of the binary contact prediction problem. The latter has been introduced as a new challenge in the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14) 2020 experiment. Despite the recent proliferation of methods for predicting distances, few methods exist for evaluating these predictions. Currently only numerical metrics, which evaluate the entire prediction at once, are used. These give no insight into the structural details of a prediction. For this reason, new methods and tools are needed.
Results: We have developed a web server for evaluating predicted inter-residue distances. Our server, DISTEVAL, accepts predicted contacts, distances, and a true structure as optional inputs to generate informative heatmaps, chord diagrams, and 3D models. All of these outputs facilitate visual and qualitative assessment. The server also evaluates predictions using other metrics such as mean absolute error, root mean squared error, and contact precision.
Conclusions: The visualizations generated by DISTEVAL complement each other and collectively serve as a powerful tool for both quantitative and qualitative assessments of predicted contacts and distances, even in the absence of a true 3D structure.
Webserver
http://deep.cs.umsl.edu/disteval/
Distance/contact evaluation using disteval.py
Download
Download from https://github.com/ba-lab/disteval/releases
Prerequisites
- Python3
- Numpy
- Scikit-learn
Installation from PIP
pip install disteval
Test
Example 0. See help
disteval -h
Download the test files from
https://github.com/ba-lab/disteval/blob/main/test/
Example 1. Evaluate a predicted RR contacts file
disteval -n ./test/1guuA.pdb -c ./test/1guuA.contact.rr
Expected output:
Evaluating contacts..
min-seq-sep: 12 xL: Top-L/5 {'precision': 1.0, 'count': 9}
min-seq-sep: 12 xL: Top-L {'precision': 1.0, 'count': 9}
min-seq-sep: 12 xL: Top-NC {'precision': 1.0, 'count': 9}
min-seq-sep: 24 xL: Top-L/5 {'precision': 1.0, 'count': 1}
min-seq-sep: 24 xL: Top-L {'precision': 1.0, 'count': 1}
min-seq-sep: 24 xL: Top-NC {'precision': 1.0, 'count': 1}
Example 2. Evaluate a predicted distance map
disteval -n ./test/1guuA.pdb -d ./test/1guuA.predicted.npy
Expected output:
Evaluating distances..
min-seq-sep: 12 xL: Top-L/5 {'mae': 0.9403, 'mse': 1.5143, 'rmse': 1.2306, 'count': 10}
min-seq-sep: 12 xL: Top-L {'mae': 1.7522, 'mse': 5.6841, 'rmse': 2.3841, 'count': 50}
min-seq-sep: 12 xL: Top-NC {'mae': 1.9263, 'mse': 6.6872, 'rmse': 2.586, 'count': 603}
min-seq-sep: 24 xL: Top-L/5 {'mae': 1.8154, 'mse': 4.6469, 'rmse': 2.1557, 'count': 10}
min-seq-sep: 24 xL: Top-L {'mae': 2.1541, 'mse': 8.1816, 'rmse': 2.8603, 'count': 50}
min-seq-sep: 24 xL: Top-NC {'mae': 2.4536, 'mse': 9.6231, 'rmse': 3.1021, 'count': 295}
Evaluating contacts..
min-seq-sep: 12 xL: Top-L/5 {'precision': 0.9, 'count': 10}
min-seq-sep: 12 xL: Top-L {'precision': 0.6, 'count': 30}
min-seq-sep: 12 xL: Top-NC {'precision': 0.6, 'count': 30}
min-seq-sep: 24 xL: Top-L/5 {'precision': 0.5, 'count': 10}
min-seq-sep: 24 xL: Top-L {'precision': 0.38462, 'count': 13}
min-seq-sep: 24 xL: Top-NC {'precision': 0.38462, 'count': 13}
Example 3. Evaluate trRosetta prediction
disteval -n ./test/1guuA.pdb -r ./test/1guuA.npz
Expected output:
Evaluating distances..
min-seq-sep: 12 xL: Top-L/5 {'mae': 0.5485, 'mse': 0.5375, 'rmse': 0.7331, 'count': 10}
min-seq-sep: 12 xL: Top-L {'mae': 0.6789, 'mse': 0.7678, 'rmse': 0.8762, 'count': 50}
min-seq-sep: 12 xL: Top-NC {'mae': 1.2951, 'mse': 3.8733, 'rmse': 1.9681, 'count': 741}
min-seq-sep: 24 xL: Top-L/5 {'mae': 0.537, 'mse': 0.4237, 'rmse': 0.6509, 'count': 10}
min-seq-sep: 24 xL: Top-L {'mae': 0.6691, 'mse': 0.6725, 'rmse': 0.8201, 'count': 50}
min-seq-sep: 24 xL: Top-NC {'mae': 1.2281, 'mse': 3.2863, 'rmse': 1.8128, 'count': 351}
Evaluating contacts..
min-seq-sep: 12 xL: Top-L/5 {'precision': 1.0, 'count': 10}
min-seq-sep: 12 xL: Top-L {'precision': 0.8, 'count': 30}
min-seq-sep: 12 xL: Top-NC {'precision': 0.8, 'count': 30}
min-seq-sep: 24 xL: Top-L/5 {'precision': 1.0, 'count': 10}
min-seq-sep: 24 xL: Top-L {'precision': 0.84615, 'count': 13}
min-seq-sep: 24 xL: Top-NC {'precision': 0.84615, 'count': 13}
Example 4. Evaluate a CASP14 RR file
wget http://deep.cs.umsl.edu/disteval/static/data/casp14/T1024/RaptorX_RR1
wget http://deep.cs.umsl.edu/disteval/static/data/casp14/casp14_pdbs/T1024.pdb
disteval -n ./T1024.pdb -c ./RaptorX_RR1
Expected output:
Evaluating distances..
min-seq-sep: 12 xL: Top-L/5 {'mae': 1.7837, 'mse': 4.9053, 'rmse': 2.2148, 'count': 78}
min-seq-sep: 12 xL: Top-L {'mae': 2.4797, 'mse': 13.0069, 'rmse': 3.6065, 'count': 392}
min-seq-sep: 12 xL: Top-NC {'mae': 3.6061, 'mse': 16.4059, 'rmse': 4.0504, 'count': 5459}
min-seq-sep: 24 xL: Top-L/5 {'mae': 1.7837, 'mse': 4.9053, 'rmse': 2.2148, 'count': 78}
min-seq-sep: 24 xL: Top-L {'mae': 2.4398, 'mse': 12.8404, 'rmse': 3.5834, 'count': 392}
min-seq-sep: 24 xL: Top-NC {'mae': 3.6114, 'mse': 16.4634, 'rmse': 4.0575, 'count': 4906}
Evaluating contacts..
min-seq-sep: 12 xL: Top-L/5 {'precision': 0.9359, 'count': 78}
min-seq-sep: 12 xL: Top-L {'precision': 0.82143, 'count': 392}
min-seq-sep: 12 xL: Top-NC {'precision': 0.68562, 'count': 633}
min-seq-sep: 24 xL: Top-L/5 {'precision': 0.9359, 'count': 78}
min-seq-sep: 24 xL: Top-L {'precision': 0.80357, 'count': 392}
min-seq-sep: 24 xL: Top-NC {'precision': 0.68631, 'count': 577}
Evaluation through 3D modeling using disteval.py
Prerequisites
- Install csh
sudo apt install csh
- Download 'dssp-2.0.4-linux-amd64' from https://osf.io/qydjv/
chmod +x dssp-2.0.4-linux-amd64
- Download TM-score from https://zhanglab.ccmb.med.umich.edu/TM-score/TMscore.gz
wget https://zhanglab.ccmb.med.umich.edu/TM-score/TMscore.gz gunzip TMscore.gz chmod +x TMscore
- DISTFOLD
- Follow instructions here to download DISTFOLD, an updated version of CONFOLD.
Test
Example 1. Predicted contacts (RR file) & Secondary structure
disteval -f ./test/1guuA.fasta -n ./test/1guuA.pdb -c ./test/1guuA.contact.rr -s ./test/1guuA.ss -o ./build-1guuA -b
Expected output:
TM-score RMSD GDT-TS MODEL
0.297 10.100 0.385 1guuA_11.pdb
0.320 7.729 0.460 1guuA_8.pdb
...
0.465 3.935 0.630 1guuA_model1.pdb
0.483 5.776 0.600 1guuA_model2.pdb
0.550 4.534 0.665 1guuA_5.pdb
Example 2. Predicted distance map (up to 12Å) without local distances & Secondary structure
disteval -f ./test/1guuA.fasta -n ./test/1guuA.pdb -d ./test/1guuA.predicted.npy -s ./test/1guuA.ss -o ./build-1guuA -b -m 6 -t 12
Expected output:
TM-score RMSD GDT-TS MODEL
0.107 37.610 0.155 extended.pdb
0.630 3.016 0.745 1guuA_11.pdb
...
0.681 2.528 0.785 1guuA_6.pdb
0.681 2.489 0.790 1guuA_9.pdb
Example 3. Predicted distance map (up to 12Å) including local distances
disteval -f ./test/1guuA.fasta -n ./test/1guuA.pdb -d ./test/1guuA.predicted.npy -s ./test/1guuA.ss -o ./build-1guuA -b -m 2 -t 12
Expected output:
TM-score RMSD GDT-TS MODEL
0.107 37.610 0.155 extended.pdb
0.253 10.230 0.340 1guuA_11.pdb
...
0.681 3.349 0.775 1guuA_13.pdb
0.684 2.330 0.795 1guuA_3.pdb
Example 4. Reconstruction using a native (true) distance map
disteval -f ./test/1guuA.fasta -n ./test/1guuA.pdb -o ./build-1guuA -p -b -m 2 -t 18
Expected output:
TM-score RMSD GDT-TS MODEL
0.107 37.610 0.155 extended.pdb
...
0.987 0.265 1.000 1guuA_model2.pdb
0.991 0.214 1.000 1guuA_16.pdb
Example 5. Distances predicted by trRosetta method
disteval -f ./test/1guuA.fasta -n ./test/1guuA.pdb -r ./test/1guuA.npz -o ./build-1guuA -b -m 2 -t 12
Expected output:
TM-score RMSD GDT-TS MODEL
0.107 37.610 0.155 extended.pdb
0.268 9.724 0.375 1guuA_14.pdb
...
0.876 0.979 0.940 1guuA_model1.pdb
0.880 1.151 0.950 1guuA_16.pdb
Using as a Library
Usage
Example 1. Convert PDB file to distance map
from disteval import pdp2dmap
pdb2dmap('path_to_pdb_file')
Example 2. Convert trRosetta prediction file (.npz) file to distance map
from disteval import trrosetta2maps
trrosetta2maps('path_to_trRosetta_npz_file')
For other functions
Please check https://github.com/ba-lab/disteval/blob/main/disteval.py
Contact
Badri Adhikari
adhikarib@umsl.edu
University of Missouri-St. Louis
Published By
Bikash Shrestha bsmmy@umsystem.edu University of Missouri-St. Louis
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.