Skip to main content

Prediicting DSSP Scores

Project description

PredictDSSP: a PARROT BRNN for predicting DSSP scores

PredictDSSP is a Python package that allows prediction of DSSP scores using a PARROT generated BRRN classifier. For more on PARROT see https://github.com/idptools/parrot

PredictDSSP is usable from the command-line or as a Python API.

Background

PredictDSSP was trained on per-residue DSSP scores. DSSP scores classify secondary structure where

0 is helicity

1 is beta strand/sheet

2 is coil

For the final network, a balanced dataset of per-residues-scores from 74,777 sequences was used. The dataset consists of 2 layers and a hidden size of 20. The network name is dssp_2022_01_07_CB_thresh_0p8_hs20_nl2.pt. Networks with fewer layers / smaller hidden size were less accurate, but additional layers and increases in hidden size did not improve accuracy above that of this network.

How accurate is this predictor?

Evaluation of the dataset on 4,352 sequences that were not used for training found that it had an overall accuracy of 74.002% and an MCC of MCC = 0.60345. The breakdown of predictions for helicity (0), beta strand/sheet (1), and coil (2) is as follows:

percent correct predicting 0 = 75.16%,

percent correct predicting 1 = 68.66%,

percent correct predicting 2 = 76.954%

We have found that the predictor performs relatively well overall but does not seem to perform well in disordered regions.

Installation

To clone the GitHub repository and gain the ability to modify a local copy of the code, run

$ git clone https://github.com/ryanemenecker/PredictDSSP.git
$ cd PredictDSSP
$ pip install .

You can now use PredictDSSP from Python or the command-line.

Usage from Python

First import PredictDSSP:

import PredictDSSP as dssp

Predicting DSSP scores

Next, make DSSP score predictions:

dssp.predict_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA')

Additional usage

You can now get the raw values from the predictions. These are returned as a list of lists. The first element in each list is the helical probability, the second element in each list is the probability to form a beta strand/sheet, and the final element in each list is the probability to be coiled.

Example

dssp.predict_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', raw_vals=True)

Graphing DSSP scores

To graph DSSP scores:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA')

optional arguments

To set the title, set 'title' equal to your desired title. Ex:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', title='mygraph')

To set exclude predictions for regions that are disordered, set exclude_disorder=True. Ex:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', exclude_disorder=True)

To set exclude disorder bars from the grpah when excluding disorder, set no_disorder_bars=True. Ex:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', exclude_disorder=True, no_disorder_bars=True)

To change the disorder threshhold, set 'dis_threshhold' to your desired threshhold. Disorder scores are calculated using metapredict, and a higher threshhold requires higher confidence to denote a region as disordered. Default is 0.3. Ex:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', exclude_disorder=True, dis_threshhold=0.5)

To change DPI of teh generated figure, set 'DPI' to the desired DPI. Default is 150. Ex:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', DPI=300)

To save the graph, set output file to the path followed by the output file name. Ex:

dssp.graph_dssp('MQWESSASSSWQQQQGGGGSAFACACAAFAAAAAA', output_file='/my/file/path/graph_name.png')

Predicting DSSP scores from fasta

To predict DSSP scores from a fasta file:

dssp.predict_dssp_fasta('/path/to/my/fasta/file/my_file.fasta')

optional arguments

To make an output file instead of returning sequence immediately, set output_file to your filepath followed by the file name. Ex:

dssp.predict_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', output_file = '/where/to/save/file/my_dssp_predictions.csv')

Graphing DSSP scores from fasta

To graph DSSP scores from sequences in a FASTA file:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta')

By default it will return individual graphs. If you have a large fasta file, you will want to save the graphs or you will have to close each graph individually as it is generated.

optional arguments

To save the graphs to a specified directory, specifiy 'output_dir' = file path to your director. Ex:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', output_dir = '/where/to/save/files/')

To set exclude predictions for regions that are disordered, set exclude_disorder=True. Ex:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', exclude_disorder=True)

To set exclude disorder bars from the grpah when excluding disorder, set no_disorder_bars=True. Ex:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', exclude_disorder=True, no_disorder_bars=True)

To change the disorder threshhold, set 'dis_threshhold' to your desired threshhold. Disorder scores are calculated using metapredict, and a higher threshhold requires higher confidence to denote a region as disordered. Default is 0.3. Ex:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', exclude_disorder=True, dis_threshhold=0.5)

To change DPI of teh generated figure, set 'DPI' to the desired DPI. Default is 150. Ex:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', DPI=300)

To set the output file type, set 'output_filetype' to your desired filetype. Ex:

dssp.graph_dssp_fasta('/path/to/my/fasta/file/my_file.fasta', output_filetype='pdf')

Predicting DSSP scores from uniprot ID

You can get DSSP score predictions by just inputting a Uniprot ID.

dssp.predict_dssp_uniprot('UniprotID')

Ex:

dssp.predict_dssp_uniprot('Q8RYC8')

Graphing DSSP scores from uniprot ID

You can generate DSSP score graphs using a uniprot ID.

dssp.graph_dssp_uniprot('UniprotID')

Ex:

dssp.graph_dssp_uniprot('Q8RYC8')

optional arguments

To set the title, set 'title' equal to your desired title. Ex:

dssp.graph_dssp_uniprot('Q8RYC8', title='mygraph')

To set exclude predictions for regions that are disordered, set exclude_disorder=True. Ex:

dssp.graph_dssp_uniprot('Q8RYC8', exclude_disorder=True)

To set exclude disorder bars from the grpah when excluding disorder, set no_disorder_bars=True. Ex:

dssp.graph_dssp_uniprot('Q8RYC8', exclude_disorder=True, no_disorder_bars=True)

To change the disorder threshhold, set 'dis_threshhold' to your desired threshhold. Disorder scores are calculated using metapredict, and a higher threshhold requires higher confidence to denote a region as disordered. Default is 0.3. Ex:

dssp.graph_dssp_uniprot('Q8RYC8', exclude_disorder=True, dis_threshhold=0.5)

To change DPI of teh generated figure, set 'DPI' to the desired DPI. Default is 150. Ex:

dssp.graph_dssp_uniprot('Q8RYC8', DPI=300)

To save the graph, set output file to the path followed by the output file name. Ex:

dssp.graph_dssp_uniprot('Q8RYC8', output_file='/my/file/path/graph_name.png')

Usage from the command-line

Graphing DSSP scores from the command-line using a Uniprot ID

dssp-uniprot generates a graph from a uniprot ID.

Example:

$ dssp-uniprot Q8RYC8

optional arguments

-D or --dpi lets you set the DPI of the generated graph.

$ dssp-uniprot Q8RYC8 -D

-e or --exclude_disorder lets you exclude disordered regions from predictions.

$ dssp-uniprot Q8RYC8 -e

-o or --output-file lets you save the graph to a specific location.

$ dssp-uniprot Q8RYC8 -o /my/path/to/file/my_graph.png

-t or --title lets you set the title of the graph.

$ dssp-uniprot Q8RYC8 -t MyCoolGraph

-c or --cutoff lets you set cutoff value for something to be considered disordered.

$ dssp-uniprot Q8RYC8 -c 0.5

Generating DSSP scores from a FASTA file form the command-line

dssp-fasta generates dssp scores from a specified FASTA file. By default will save the scores to your current directory as dssp_scores.csv

Example:

$ dssp-fasta /path/to/my/file/sequences_file.fasta

optional arguments

-o or --output-file lets you save the scores to a specific location.

$ dssp-fasta /path/to/my/file/sequences_file.fasta -o /where/to/save/these/scores/dssp_scores.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PredictDSSP-1.tar.gz (107.3 kB view details)

Uploaded Source

File details

Details for the file PredictDSSP-1.tar.gz.

File metadata

  • Download URL: PredictDSSP-1.tar.gz
  • Upload date:
  • Size: 107.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.11

File hashes

Hashes for PredictDSSP-1.tar.gz
Algorithm Hash digest
SHA256 c29cb46d98b6e5979c404c68748cc4c5171254d49cc4cbf84e8c91a0faa230b8
MD5 efc20d4f4793241d6d714d58de1702e5
BLAKE2b-256 31f355707f64da51e936397851ed6f15b75cc0eb08988cdf53365ae7fc09e0d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page