Skip to main content

A transmembrane helix finder.

Project description

Introduction

This repository houses a Python 3.5+ implementation of transmembrane helix hidden Markov model (TMHMM) originally described in:

E. L.L. Sonnhammer, G. von Heijne, and A. Krogh. A hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press.

Why?

I did this for a few reasons:

  • the source code is not available as part of the publication,
  • the downloadable binaries are Linux-only,
  • the downloadable binaries may not be redistributed, so it's not possible to put them in a Docker image or a VM for other people to use,
  • the need to predict transmembrane helices on a large dataset, which rules out the web service.

This Python implementation includes a parser for the undocumented file format used to describe the model and a pretty fast Cython implementation of the Viterbi algorithm used to perform the annotation. The tool will output files similar to the files produced by the original TMHMM implementation.

Incompatibilities

  • The original TMHMM implementation handles ambigious characters and gaps in an undocumented way. However, tmhmm.py does not attempt to handle such characters at all and will fail. A possible fix is to replace those characters with something also based on expert/domain knowledge. For details, see issue #9.

Installation

This package supports Python 3.5, 3.6, and 3.7. Install with:

$ pip install tmhmm.py

Only Linux is supported at the moment.

Usage

$ tmhmm -h
  usage: tmhmm [-h] -f SEQUENCE_FILE [-m MODEL_FILE] [-p]

  optional arguments:
    -h, --help            show this help message and exit
    -f SEQUENCE_FILE, --file SEQUENCE_FILE
                          path to file in fasta format with sequences
    -m MODEL_FILE, --model MODEL_FILE
                          path to the model to use
    -p, --plot            plot posterior probabilies

The -p/--plot option will only be available if matplotlib is installed and importable.

Say we have the following sequence in FASTA format in a file called test.fa:

>B9DFX7|1B|HMA8_ARATH Copper-transporting ATPase PAA2, chloroplastic  [Arabidopsis thaliana ]
MASNLLRFPLPPPSSLHIRPSKFLVNRCFPRLRRSRIRRHCSRPFFLVSNSVEISTQSFESTESSIESVKSITSDTPIL
LDVSGMMCGGCVARVKSVLMSDDRVASAVVNMLTETAAVKFKPEVEVTADTAESLAKRLTESGFEAKRRVSGMGVAENV
KKWKEMVSKKEDLLVKSRNRVAFAWTLVALCCGSHTSHILHSLGIHIAHGGIWDLLHNSYVKGGLAVGALLGPGRELLF
DGIKAFGKRSPNMNSLVGLGSMAAFSISLISLVNPELEWDASFFDEPVMLLGFVLLGRSLEERAKLQASTDMNELLSLI
STQSRLVITSSDNNTPVDSVLSSDSICINVSVDDIRVGDSLLVLPGETFPVDGSVLAGRSVVDESMLTGESLPVFKEEG
CSVSAGTINWDGPLRIKASSTGSNSTISKIVRMVEDAQGNAAPVQRLADAIAGPFVYTIMSLSAMTFAFWYYVGSHIFP
DVLLNDIAGPDGDALALSLKLAVDVLVVSCPCALGLATPTAILIGTSLGAKRGYLIRGGDVLERLASIDCVALDKTGTL
TEGRPVVSGVASLGYEEQEVLKMAAAVEKTATHPIAKAIVNEAESLNLKTPETRGQLTEPGFGTLAEIDGRFVAVGSLE
WVSDRFLKKNDSSDMVKLESLLDHKLSNTSSTSRYSKTVVYVGREGEGIIGAIAISDCLRQDAEFTVARLQEKGIKTVL
LSGDREGAVATVAKNVGIKSESTNYSLSPEKKFEFISNLQSSGHRVAMVGDGINDAPSLAQADVGIALKIEAQENAASN
AASVILVRNKLSHVVDALSLAQATMSKVYQNLAWAIAYNVISIPIAAGVLLPQYDFAMTPSLSGGLMALSSIFVVSNSL
LLQLHKSETSKNSL

We can then run tmhmm.py on this file using the following command:

$ tmhmm -m TMHMM2.0.model -f test.fa

This produces a bunch of files. One is the summary:

$ cat B9DFX7|1B|HMA8_ARATH.summary
0-444: outside
445-467: transmembrane helix
468-820: inside
821-843: transmembrane helix
844-852: outside
853-870: transmembrane helix
871-882: inside

An annotation in FASTA format:

$ cat B9DFX7|1B|HMA8_ARATH.annotation
>B9DFX7|1B|HMA8_ARATH Copper-transporting ATPase PAA2, chloroplastic  [Arabidopsis thaliana ]
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMMMMMMMMMMMMMMMiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooooooMMMMMMMMMMMMMMMM
MMiiiiiiiiiiii

And finally a file containing the posterior probabilities for each label for plotting.

$ cat B9DFX7|1B|HMA8_ARATH.plot
inside membrane outside
0.20341044516 0.0 0.79658955484
0.210104176071 2.77194446172e-08 0.78989579621
0.189291062167 3.11365191554e-08 0.810708906697
0.253334801857 7.17866017044e-07 0.746664480277
0.126185012808 1.34197873962e-05 0.873801567405
...

If the -p flag is set a plot in PDF format will also be produced, following the same naming scheme as the other output files.

API

You can also use tmhmm.py as a library:

import tmhmm
annotation, posterior = tmhmm.predict(sequence, 'mymodel.model')

This returns the annotation as a string and the posterior probabilities for each label as a numpy array with shape (len(sequence), 3) where column 0, 1 and 2 corresponds to being inside, transmembrane and outside, respectively.

If you don't need the posterior probabilities set compute_posterior=False, this will save quite a lot of computation:

annotation, posterior = tmhmm.predict(
    sequence, 'mymodel.model', compute_posterior=False
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tmhmm.py-1.3.1-cp37-cp37m-manylinux2010_x86_64.whl (182.5 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

tmhmm.py-1.3.1-cp37-cp37m-manylinux1_x86_64.whl (182.5 kB view details)

Uploaded CPython 3.7m

tmhmm.py-1.3.1-cp36-cp36m-manylinux2010_x86_64.whl (181.5 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

tmhmm.py-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (181.5 kB view details)

Uploaded CPython 3.6m

tmhmm.py-1.3.1-cp35-cp35m-manylinux2010_x86_64.whl (181.1 kB view details)

Uploaded CPython 3.5mmanylinux: glibc 2.12+ x86-64

tmhmm.py-1.3.1-cp35-cp35m-manylinux1_x86_64.whl (181.1 kB view details)

Uploaded CPython 3.5m

File details

Details for the file tmhmm.py-1.3.1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tmhmm.py-1.3.1-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 182.5 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for tmhmm.py-1.3.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7314684aecc88a0c5519a92f425bdf7bb7a77350c0c329011a383f5a65afb3a2
MD5 99a09973e4ca4f7fe1ea0f867f332ee1
BLAKE2b-256 b0b68f59e68107d577c5f0ac70549e3ea8ba784be96280679876946f2d223278

See more details on using hashes here.

File details

Details for the file tmhmm.py-1.3.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tmhmm.py-1.3.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 182.5 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for tmhmm.py-1.3.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9214a4943967deab0c48984a40947c44d6642ce106047b89943d8cc3735d4d11
MD5 b181382aabb370f260aff60bd2e73166
BLAKE2b-256 5c8acac83a3b8acead85abe3aa1cfe8dd30f62915b1c1acbf298c0a42b850dee

See more details on using hashes here.

File details

Details for the file tmhmm.py-1.3.1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tmhmm.py-1.3.1-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 181.5 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for tmhmm.py-1.3.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 ceb3743237d76ada06d2135cd69dd028193253b6c1ee539ff74d9ab2d34739b2
MD5 08ab1401932e1c00f77c78088ddb240f
BLAKE2b-256 7815a7f7460b3c1c8a1cf4f1335c9428dd0b339ad20cd286109e9c2f54a7e81f

See more details on using hashes here.

File details

Details for the file tmhmm.py-1.3.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tmhmm.py-1.3.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 181.5 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for tmhmm.py-1.3.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 526e3bf31dcd4d2c27a64135bc266c6ac1498c035f6a0bde18c855fe99a02ed0
MD5 c8e26841016fab025b90e9d79be83ac6
BLAKE2b-256 d32a1ffa9b01966e1250e442451068834a8061b2b2fcd70061a905928e1384ce

See more details on using hashes here.

File details

Details for the file tmhmm.py-1.3.1-cp35-cp35m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tmhmm.py-1.3.1-cp35-cp35m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 181.1 kB
  • Tags: CPython 3.5m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for tmhmm.py-1.3.1-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 800933dbe335bd8a8e4869d4e17df70508ef55a2d03e25a21ed9f3d1edc53785
MD5 1b8e1c348f4864b6698a7dd4be63cd0b
BLAKE2b-256 b41c874fe00b89980f23299e8a51c2b3a89fdd10f9e4b3f4f7179204ce591c06

See more details on using hashes here.

File details

Details for the file tmhmm.py-1.3.1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tmhmm.py-1.3.1-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 181.1 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for tmhmm.py-1.3.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 10008ea84ab54f40298fa825a5cd6c1397258a5b20c422eb4d5d75c62749ccbd
MD5 fa1a355dc9ab78561dd5cc2836caa475
BLAKE2b-256 d3769a94a92246c3f94304f11b7503c29b77dd52478255d344b296679326a635

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page