Skip to main content

A simplified implementation of DSSP algorithm for PyTorch and NumPy

Project description

PyDSSP

A simplified implementation of DSSP algorithm for PyTorch and NumPy

What's this?

DSSP (Dictionary of Secondary Structure of Protein) is a popular algorithm for assigning secondary structure of protein backbone structure. [ Wolfgang Kabsch, and Christian Sander (1983)] This repository is a python implementation of DSSP algorithm that simplifies some parts of the algorithm.

General Info

  • It's NOT a complete implementation of the original DSSP, as some parts have been simplified (some more details here). However, an average of over 97% of secondary structure determinations agree with the original.
  • The algorithm used to identify hydrogen bonded residue pairs is exactly the same as the original DSSP algorithm, but is extended to output the hydrogen-bond-pair-matrix as continuous values in the range [0,1].
  • With the continuous variable extension above, the hydrogen-bond-pair-matrix is differentiable with torch.Tensor as input.

Install

install through PyPi

pip install pydssp

to install the latest version

pip install git+https://github.com/ShintaroMinami/PyDSSP.git

install by git clone

git clone https://github.com/ShintaroMinami/PyDSSP.git
cd PyDSSSP
python setup.py install

How to use

To use pydssp script

If you have already installed pydssp, you should be able to use pydssp command.

pydssp  input_01.pdb input_02.pdb ... input_N.pdb -o output.result

The output.result will be a text format, looking like follows,

-EEEEE-E--EEEEEE---EEEE-HHHH--EEEE--------- input_01.pdb
-HHHHHHHHHHHHHH----HHHHHHHHHHHHHHHHHHH--- input_02.pdb
-EEEE-----EEEE----EEEE--E---EEE-----EEE-EEE-- input_03.pdb
...

To use as python module

Import & test coordinates

# Import
import torch
import pydssp

# Sample coordinates
batch, length, atoms, xyz = 10, 100, 4, 3
## atoms should be 4 (N, CA, C, O) or 5 (N, CA, C, O, H)
coord = torch.randn([batch, length, atom, xyz]) # batch-dim is optional

To get hydrogen-bond matrix: pydssp.get_hbond_map()

hbond_matrix = pydssp.get_hbond_map(coord)

print(hbond_matrix.shape) # should be (batch, length, length)
  • For hbond_matrix[b, i, j], index 'i' is for donner (N-H) and 'j' is for acceptor (C=O), respectively
  • The output matrix consists of continuous values in the range [0,1], which is defined as follows.

$HbondMat(i,j) = (1+\sin((-0.5-E(i,j)-margin)/margin*\pi/2))/2$

Here $E$ is the electrostatic energy defined by (Kabsch and Sander 1983) and $margin(=1.0)$ is introduced to control smoothness.

If you'd like to get the same hbond assignment as DSSP, you can get it by setting the threshold as 0.5.

dssp_hbond_matrix = pydssp.get_hbond_map(coord) > 0.5

To get secondary structure assignment: pydssp.assign()

dssp = pydssp.assign(coord, out_type='c3')
## output is batched np.ndarray of C3 annotation, like ['-', 'H', 'H', ..., 'E', '-']

# To get secondary str. as index
dssp = pydssp.assign(coord, out_type='index')
## 0: loop,  1: alpha-helix,  2: beta-strand

# To get secondary str. as onehot representation
dssp = pydssp.assign(coord, out_type='onehot')
## dim-0: loop,  dim-1: alpha-helix,  dim-2: beta-strand

Differences from the original DSSP

This implementation was simplified from the original DSSP algorithm. The differences from the original DSSP are as follows

  • The implementation omitted β-bulge annotation, so β-bulge is determined as a loop instead of β-strand.
  • Parameters for adding hydrogen atoms are slightly different from the original DSSP, which may cause small differences in hydrogen bond annotation.
  • Only support C3 ('-', 'H', and 'E') type assignment instead of C8 type (B, E, G, H, I, S, T, and ' ').

Although the above simplifications, the C3 type annotation still matches with the original DSSP for more than 97% of residues on average.

Reference

@article{kabsch1983dictionary,
  title={Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features},
  author={Kabsch, Wolfgang and Sander, Christian},
  journal={Biopolymers: Original Research on Biomolecules},
  volume={22},
  number={12},
  pages={2577--2637},
  year={1983},
  publisher={Wiley Online Library}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydssp-0.9.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file pydssp-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: pydssp-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for pydssp-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74fb8129c07c1625bb687b80f7e94ae7ebf1277725258d7fc75fc1f3d12a67dc
MD5 42cae8624d742d2e9c5af072279a7ef6
BLAKE2b-256 00789cbcc1c073b9d4918e925af1a059762265dc65004e020511b2a06fbfd020

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page