Skip to main content

A transmembrane helix finder.

Project description

Introduction

pyTMHMM is a Python 3.5+/Cython implementation of the transmembrane helix predictor using a hidden Markov model (TMHMM) originally described in:

E.L. Sonnhammer, G. von Heijne, and A. Krogh. A hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press. PMID 9783223

History

Dan Søndergaard is the original author of this package and his repository is now archived. Dan wrote this code for a few reasons:

  • the source code is not available as part of the publication
  • the downloadable binaries are Linux-only
  • the downloadable binaries may not be redistributed, so it's not possible to put them in a Docker image or a VM for other people to use
  • the need to predict transmembrane helices in a scripted, automated way

This Python implementation includes a parser for the undocumented file format used to describe the model and a fast Cython implementation of the Viterbi algorithm used to perform the annotation. The tool will output files similar to the files produced by the original TMHMM implementation.

Incompatibilities

  • The original TMHMM implementation handles ambigious characters and gaps in an undocumented way. However, pyTMHMM does not attempt to handle such characters at all and will fail. A possible fix is to replace those characters with something also based on expert/domain knowledge.

Installation

This package supports Python 3.5 or greater. Install with:

> pip install pyTMHMM

Usage

> pyTMHMM -h
usage: pyTMHMM [-h] -f SEQUENCE_FILE [-m MODEL_FILE] [-p]

required arguments:
-f SEQUENCE_FILE, --file SEQUENCE_FILE
                    path to file in fasta format with sequences

optional arguments:
-h, --help          show this help message and exit
-m MODEL_FILE, --model MODEL_FILE
                    path to the model to use (default: TMHMM2.0.model)
-p, --plot          plot posterior probabilies

The -p/--plot option requires matplotlib.

The input sequence file should have one or more sequences in Fasta format, for example:

> head PAR3_HUMAN.fasta
>sp|O00254|PAR3_HUMAN Proteinase-activated receptor 3 OS=Homo sapiens OX=9606 GN=F2RL2 PE=1 SV=1
MKALIFAAAGLLLLLPTFCQSGMENDTNNLAKPTLPIKTFRGAPPNSFEEFPFSALEGWT
GATITVKIKCPEESASHLHVKNATMGYLTSSLSTKLIPAIYLLVFVVGVPANAVTLWMLF
FRTRSICTTVFYTNLAIADFLFCVTLPFKIAYHLNGNNWVFGEVLCRATTVIFYGNMYCS
ILLLACISINRYLAIVHPFTYRGLPKHTYALVTCGLVWATVFLYMLPFFILKQEYYLVQP
DITTCHDVHNTCESSSPFQLYYFISLAFFGFLIPFVLIIYCYAAIIRTLNAYDHRWLWYV
KASLLILVIFTICFAPSNIILIIHHANYYYNNTDGLYFIYLIALCLGSLNSCLDPFLYFL
MSKTRNHSTAYLTK

Example command:

> pyTMHMM -f PAR3_HUMAN.fasta

This produces three files for each sequence in the Fasta file, named by id.

Summary file

The coordinates of the predicted domains:

> cat sp|O00254|PAR3_HUMAN.summary 
0 97 outside
98 120 transmembrane helix
121 128 inside
129 151 transmembrane helix
152 165 outside
166 188 transmembrane helix
189 207 inside
208 230 transmembrane helix
231 259 outside
260 282 transmembrane helix
283 302 inside
303 322 transmembrane helix
323 336 outside
337 359 transmembrane helix
360 373 inside

Annotation file

An annotated sequence in Fasta-like format:

> cat sp|O00254|PAR3_HUMAN.annotation 
>sp|O00254|PAR3_HUMAN Proteinase-activated receptor 3 OS=Homo sapiens OX=9606 GN=F2RL2 PE=1 SV=1
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooo
ooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooo
oooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMM
MMMMMMMooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiii

Posterior probabilies file

A file containing the posterior probabilities for each label.

> head sp|O00254|PAR3_HUMAN.plot 
inside membrane outside
0.6417636608794935 0.0 0.3582363391205064
0.693933311909457 0.006819179965744769 0.2992475081247982
0.3041488405999551 0.36045181385397806 0.3353993455460668
0.15867304975718463 0.5320740444690139 0.3092529057738015
0.011878169861623369 0.8126781067794638 0.1754437233589128
0.009103844612501565 0.7722962064006578 0.21859994898684057
0.0008287471596339259 0.6966223976666195 0.3025488551737467
0.0007860447761827514 0.7122010989508554 0.2870128562729619
0.0006349307902653272 0.712364526792757 0.28700054241697776

Plot

If the -p flag is set a plot in PDF format is made.

"TM domains in PAR3_HUMAN"

doc/PAR3_HUMAN.pdf

API

You can also use pyTMHMM as a library:

import pyTMHMM
annotation, posterior = pyTMHMM.predict(sequence_string)

This returns the annotation as a string and the posterior probabilities for each label as a numpy array with shape (len(sequence), 3) where column 0, 1 and 2 corresponds to being inside, transmembrane and outside, respectively.

If you don't need the posterior probabilities set compute_posterior=False, this will save computation:

annotation = pyTMHMM.predict(
    sequence_string, compute_posterior=False
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyTMHMM-1.3.6.tar.gz (99.2 kB view details)

Uploaded Source

Built Distribution

pyTMHMM-1.3.6-cp311-cp311-macosx_13_0_arm64.whl (53.1 kB view details)

Uploaded CPython 3.11 macOS 13.0+ ARM64

File details

Details for the file pyTMHMM-1.3.6.tar.gz.

File metadata

  • Download URL: pyTMHMM-1.3.6.tar.gz
  • Upload date:
  • Size: 99.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for pyTMHMM-1.3.6.tar.gz
Algorithm Hash digest
SHA256 2d91932dbd77d9dba867cb9cff218773306d8cb63282cdf41dabbb466979ed89
MD5 9cb1710e7e98ccfd586f4409e42dbc94
BLAKE2b-256 895fa1a7b0b541b00770ed144e788252792ad8683392d544cf1bc39aa295d735

See more details on using hashes here.

File details

Details for the file pyTMHMM-1.3.6-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for pyTMHMM-1.3.6-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 621cd9733b37f4653e923ee329130ab48ed3b798fcaae2bb8e47403f64c0ddca
MD5 6bc79200f2d256436f6888b1d35b9f39
BLAKE2b-256 2c3278f29e3f42e3ea18b8f2115f050dbf81a10d37c10ad2ab1abeaee0625d76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page