Skip to main content

A transmembrane helix finder.

Reason this release was yanked:

Installation bug

Project description

Introduction

pyTMHMM is a Python 3.5+ implementation of the transmembrane helix predictor using a hidden Markov model (TMHMM) originally described in:

E.L. Sonnhammer, G. von Heijne, and A. Krogh. A hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press. PMID 9783223

History

Dan Søndergaard is the original author of this package and his repository is now archived. Dan wrote this code for a few reasons:

  • the source code is not available as part of the publication
  • the downloadable binaries are Linux-only
  • the downloadable binaries may not be redistributed, so it's not possible to put them in a Docker image or a VM for other people to use
  • the need to predict transmembrane helices in a scripted, automated way

This Python implementation includes a parser for the undocumented file format used to describe the model and a fast Cython implementation of the Viterbi algorithm used to perform the annotation. The tool will output files similar to the files produced by the original TMHMM implementation.

Incompatibilities

  • The original TMHMM implementation handles ambigious characters and gaps in an undocumented way. However, pyTMHMM does not attempt to handle such characters at all and will fail. A possible fix is to replace those characters with something also based on expert/domain knowledge.

Installation

This package supports Python 3.5 or greater. Install with:

$ pip install pyTMHMM

Usage

$ pyTMHMM -h
  usage: pyTMHMM [-h] -f SEQUENCE_FILE [-m MODEL_FILE] [-p]

  optional arguments:
    -h, --help            show this help message and exit
    -f SEQUENCE_FILE, --file SEQUENCE_FILE
                          path to file in fasta format with sequences
    -m MODEL_FILE, --model MODEL_FILE
                          path to the model to use
    -p, --plot            plot posterior probabilies

The -p/--plot option requires matplotlib.

The input sequence file should have one or more sequences in FASTA format, for example:

>B9DFX7|1B|HMA8_ARATH Copper-transporting ATPase PAA2, chloroplastic  [Arabidopsis thaliana ]
MASNLLRFPLPPPSSLHIRPSKFLVNRCFPRLRRSRIRRHCSRPFFLVSNSVEISTQSFESTESSIESVKSITSDTPIL
LDVSGMMCGGCVARVKSVLMSDDRVASAVVNMLTETAAVKFKPEVEVTADTAESLAKRLTESGFEAKRRVSGMGVAENV
KKWKEMVSKKEDLLVKSRNRVAFAWTLVALCCGSHTSHILHSLGIHIAHGGIWDLLHNSYVKGGLAVGALLGPGRELLF
DGIKAFGKRSPNMNSLVGLGSMAAFSISLISLVNPELEWDASFFDEPVMLLGFVLLGRSLEERAKLQASTDMNELLSLI
STQSRLVITSSDNNTPVDSVLSSDSICINVSVDDIRVGDSLLVLPGETFPVDGSVLAGRSVVDESMLTGESLPVFKEEG
CSVSAGTINWDGPLRIKASSTGSNSTISKIVRMVEDAQGNAAPVQRLADAIAGPFVYTIMSLSAMTFAFWYYVGSHIFP
DVLLNDIAGPDGDALALSLKLAVDVLVVSCPCALGLATPTAILIGTSLGAKRGYLIRGGDVLERLASIDCVALDKTGTL
TEGRPVVSGVASLGYEEQEVLKMAAAVEKTATHPIAKAIVNEAESLNLKTPETRGQLTEPGFGTLAEIDGRFVAVGSLE
WVSDRFLKKNDSSDMVKLESLLDHKLSNTSSTSRYSKTVVYVGREGEGIIGAIAISDCLRQDAEFTVARLQEKGIKTVL
LSGDREGAVATVAKNVGIKSESTNYSLSPEKKFEFISNLQSSGHRVAMVGDGINDAPSLAQADVGIALKIEAQENAASN
AASVILVRNKLSHVVDALSLAQATMSKVYQNLAWAIAYNVISIPIAAGVLLPQYDFAMTPSLSGGLMALSSIFVVSNSL
LLQLHKSETSKNSL

Example command:

$ pyTMHMM -f test.fa

This produces three files for each sequence.

Summary file

The coordinates of the predicted domains:

$ cat B9DFX7|1B|HMA8_ARATH.summary
0-444: outside
445-467: transmembrane helix
468-820: inside
821-843: transmembrane helix
844-852: outside
853-870: transmembrane helix
871-882: inside

Annotation file

An annotated sequence in FASTA-like format:

$ cat B9DFX7|1B|HMA8_ARATH.annotation
>B9DFX7|1B|HMA8_ARATH Copper-transporting ATPase PAA2, chloroplastic  [Arabidopsis thaliana ]
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMMMMMMMMMMMMMMMiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooooooMMMMMMMMMMMMMMMM
MMiiiiiiiiiiii

Posterior probabilies file

A file containing the posterior probabilities for each label for plotting.

$ cat B9DFX7|1B|HMA8_ARATH.plot
inside membrane outside
0.20341044516 0.0 0.79658955484
0.210104176071 2.77194446172e-08 0.78989579621
0.189291062167 3.11365191554e-08 0.810708906697
0.253334801857 7.17866017044e-07 0.746664480277
0.126185012808 1.34197873962e-05 0.873801567405
...

If the -p flag is set a plot in PDF format will also be produced, following the same naming scheme as the other output files.

API

You can also use pyTMHMM as a library:

import pyTMHMM
annotation, posterior = pyTMHMM.predict(sequence_string)

This returns the annotation as a string and the posterior probabilities for each label as a numpy array with shape (len(sequence), 3) where column 0, 1 and 2 corresponds to being inside, transmembrane and outside, respectively.

If you don't need the posterior probabilities set compute_posterior=False, this will save computation:

annotation = pyTMHMM.predict(
    sequence_string, compute_posterior=False
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyTMHMM-1.3.4.tar.gz (98.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyTMHMM-1.3.4-cp311-cp311-macosx_13_0_arm64.whl (51.7 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file pyTMHMM-1.3.4.tar.gz.

File metadata

  • Download URL: pyTMHMM-1.3.4.tar.gz
  • Upload date:
  • Size: 98.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for pyTMHMM-1.3.4.tar.gz
Algorithm Hash digest
SHA256 5413b04660c5c0a5c300a80e249ba65a840821705fc4fa3ac6aafc87732f80dd
MD5 c005730b8a0c4cbef76ff2fb46241d42
BLAKE2b-256 13ff64ecfc430fe1fbf6742b9b522194b201ce57b83ef033301a225fd140ac59

See more details on using hashes here.

File details

Details for the file pyTMHMM-1.3.4-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for pyTMHMM-1.3.4-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 c7e53f8d4e22ee234bdef4d27dd6c9634b92d5624f4db2bcc4cd1d59793b8513
MD5 a006b875ba39ccfc5e65e1a303b3ba79
BLAKE2b-256 afc37236faac7bccfea6a6b8e815c44198d476a7e41368072e637b6144ff7261

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page