ANSURR uses backbone chemical shifts to validate the accuracy of NMR protein structures.
Project description
ANSURR | Accuracy of NMR Structures Using RCI and Rigidity v2.0.43
ANSURR uses backbone chemical shifts to validate the accuracy of NMR protein structures as described here https://www.nature.com/articles/s41467-020-20177-1. This repository contains the code required to install and run ANSURR on a Linux or a Mac. ANSURR v1.2.1 is also available on NMRbox (https://nmrbox.org/software/ansurr). Please let me know if you have any issues.
Installation
ANSURR v2 is installed using pip (https://packaging.python.org/en/latest/tutorials/installing-packages/).
pip install ansurr
You will also need java in order to re-reference chemical shifts using PANAV (recommended) (https://java.com/en/download/help/download_options.html).
Running ANSURR
ANSURR requires two input files, a NMR protein structure in PDB format and a shifts file in NEF format or NMR Star v3 format. To re-reference chemical shifts using PANAV before running ANSURR (recommended):
ansurr -p xxxx.pdb -s xxxx.nef -r
To run without re-referencing chemical shifts:
ansurr -p xxxx.pdb -s xxxx.nef
Options:
-p
input pdb file
-s
input shifts file
-h
print the help message
-l
include free ligands when computing flexibility.
-m
only output the ANSURR scores in a text file
-n
include non-standard residues when computing flexibility. Note that RCI will not be calculated for non-standard residues and so they will not be used to compute validation scores. Regardless, including non-standard residues is a good idea to avoid breaks in the protein structure which would otherwise make those regions too floppy.
-o
combine chains into a single structure when calculating flexibility. This is useful when the structure is an oligomer as oligomerisation will often result in changes in flexibility.
-r
re-reference chemical shifts using PANAV before running ANSURR (recommended).
-q
suppress output to the terminal
-v
print version details
-w
compute ANSURR scores for the well-defined residues identified by CYRANGE. These scores are computed using a separate benchmark for well-defined residues.
Output
A directory called <yourpdbfile>_<yourshiftfile>
is made to save the output generated. This directory will be overwritten if you run ANSURR again with input files with the same names as before. This directory contains two directories called ANSURR_output
and other_output
. ANSURR_output
contains:
scores.out
- a text file with the validation scores for each model<yourpdbfile>_<yourshiftfile>_ansurr.nef
- a NEF file with most output generated by ANSURR<yourpdbfile>_<yourshiftfile>_ansurr.json
- a json file with most output generated by ANSURR<yourpdbfile>_<yourshiftfile>.png
- a graphical summary of the validation scores for each modelout/
- text files for each model which detail the following for each residue: flexibility predicted by RCI, flexibility predicted by FIRST, secondary structure according to DSSP, well-defined regions of the ensemble according to CYRANGE, backbone chemical shift completeness and which atom types have chemical shift datafigs/
- plots of protein flexibility predicted by RCI (blue) and FIRST (orange) for each model. Alpha helical and beta sheet secondary structure indicated by red and blue dots, respectively. Green dots indicate regions that are well-defined according to CYRANGE. Black crosses indicate residues with no chemical shift data (not used to compute validation scores).
other_output
contains output from various programs run as part of ANSURR:
PANAV/
- re-referenced chemical shiftsRCI/
- flexibility predicted from chemical shifts using RCIextracted_pdbs/
- PDB files for each model extracted from the NMR structureDSSP/
- secondary structure for each model according to the program DSSPFIRST/
- flexibility predicted for each model using FIRST
Help
Contact Nick Fowler (njfowler.com) for support, queries or suggestions.
Known Issues
-
The Mac version of ANSURR gives slightly different ANSURR scores (average difference of less than 1) due to using a different compiler to compile the underlying C++ code for computing rigidity. I'm working on a fix.
-
Secondary structure is currently not computed in the Mac version.
Acknowledgements
Random Coil Index (RCI) | Berjanskii, M.V. & Wishart, D.S. A simple method to predict protein flexibility using secondary chemical shifts. Journal of the American Chemical Society 127, 14970-14971 (2005).
Floppy Inclusions and Rigid Substructure Topography (FIRST) | Jacobs, D.J., Rader, A.J., Kuhn, L.A. & Thorpe, M.F. Protein flexibility predictions using graph theory. Proteins-Structure Function and Genetics 44, 150-165 (2001).
Probabilistic Approach to NMR Assignment and Validation (PANAV) | Bowei Wang, Yunjun Wang and David S. Wishart. "A probabilistic approach for validating protein NMR chemical shift assignments". Journal of Biomolecular NMR. Volume 47, Number 2 / June 2010: 85-99
DSSP | A series of PDB related databases for everyday needs. Wouter G Touw, Coos Baakman, Jon Black, Tim AH te Beek, E Krieger, Robbie P Joosten, Gert Vriend. Nucleic Acids Research 2015 January; 43(Database issue): D364-D368. | Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Kabsch W, Sander C, Biopolymers. 1983 22 2577-2637.
adjustText - automatic label placement for matplotlib | https://github.com/Phlya/adjustText
CYRANGE | D.K. Kirchner & P. Güntert, BMC Bioinformatics 2011, 12 170.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.