Skip to main content

Cython bindings and Python interface to HMMER3.

Project description

🐍🟡♦️🟦 pyHMMER Stars

Cython bindings and Python interface to HMMER3.

GitLabCI Coverage PyPI Bioconda Wheel Python Versions Python Implementations License Source Mirror GitHub issues Docs Changelog Downloads DOI

🗺️ Overview

HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is maintained by members of the the Eddy/Rivas Laboratory at Harvard University.

pyhmmer is a Python module, implemented using the Cython language, that provides bindings to HMMER3. It directly interacts with the HMMER internals, which has the following advantages over CLI wrappers (like hmmer-py):

  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyhmmer as a dependency to your project, and stop worrying about the HMMER binaries being properly setup on the end-user machine.
  • no intermediate files: Everything happens in memory, in Python objects you have control on, making it easier to format your inputs to pass to HMMER without needing to write them to a file. Output retrieval is also done in memory, through instances of the pyhmmer.plan7.TopHits class.
  • no input formatting: The Easel object model is exposed in the pyhmmer.easel module, and you have the possibility to build a Sequence object yourself to pass to the HMMER pipeline. This is useful if your sequences are already loaded in memory, for instance because you obtained them from another Python library (such as Pyrodigal or Biopython).
  • no output formatting: HMMER3 is notorious for its numerous output files and its fixed-width tabular output, which is hard to parse (even Bio.SearchIO.HmmerIO is struggling on some sequences).
  • efficient: Using pyhmmer to launch hmmsearch on sequences and HMMs in disk storage is typically not slower than directly using the hmmsearch binary (see the Benchmarks section). pyhmmer.hmmsearch uses a different parallelisation strategy compared to the hmmsearch binary from HMMER, which helps getting the most of multiple CPUs.

This library is still a work-in-progress, and in a very experimental stage, but it should already pack enough features to run simple biological analyses involving hmmsearch.

🔧 Installing

pyhmmer can be installed from PyPI, which hosts some pre-built CPython wheels for x86-64 Linux, as well as the code required to compile from source with Cython:

$ pip install pyhmmer

Compilation for UNIX PowerPC is not tested in CI, but should work out of the box. Other architectures (e.g. Arm) and OSes (e.g. Windows) are not supported by HMMER.

A Bioconda package is also available, but only for Linux:

$ conda install -c bioconda pyhmmer

📖 Documentation

A complete API reference can be found in the online documentation, or directly from the command line using pydoc:

$ pydoc pyhmmer.easel
$ pydoc pyhmmer.plan7

💡 Example

Use pyhmmer to run hmmsearch, and obtain an iterable over TopHits that can be used for further sorting/querying in Python:

import pyhmmer

with pyhmmer.easel.SequenceFile("938293.PRJEB85.HG003687.faa") as file:
    alphabet = file.guess_alphabet()
    sequences = [seq.digitize(alphabet) for seq in file]

with pyhmmer.plan7.HMMFile("Pfam.hmm") as hmms:
    all_hits = list(pyhmmer.hmmsearch(hmms, sequences_file, cpus=4))

Processing happens in parallel using Python threads, and a TopHits object is yielded for every HMM passed in the input iterable. Note that for optimal performance, you should pass the number of physical cores to the cpus argument of the pyhmmer.hmmsearch function, as HMMER requires too many SIMD registers to benefit from hyperthreading.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⏱️ Benchmarks

Benchmarks were run on a i7-8550U CPU running at 1.80GHz, using a FASTA file containing 2100 protein sequences (tests/data/seqs/938293.PRJEB85.HG003687.faa) and a subset of the Pfam HMM library containing 2873 domains. Commands were run 20 times.

Command # CPUs mean (s) σ (ms) min (s) max (s) Speedup
python -m pyhmmer hmmsearch 4 20.706 316 19.960 42.457 x1.00
python -m pyhmmer hmmsearch 2 24.076 842 22.289 21.118 x1.16
hmmsearch 2 35.046 161 34.734 35.183 x1.69
hmmsearch 4 37.721 78 37.605 37.847 x1.82
python -m pyhmmer hmmsearch 1 39.022 1346 36.081 40.644 x1.88
hmmsearch 1 44.360 243 44.184 45.018 x2.14
hmmscan 2 102.248 381 101.479 102.765 x4.93
hmmscan 4 106.779 375 106.197 107.482 x5.15
hmmscan 1 107.945 326 107.460 108.502 x5.21

⚖️ License

This library is provided under the MIT License. The HMMER3 and Easel code is available under the BSD 3-clause license. See vendor/hmmer/LICENSE and vendor/easel/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original HMMER authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhmmer-0.1.4.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyhmmer-0.1.4-cp39-cp39-manylinux2010_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

pyhmmer-0.1.4-cp39-cp39-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.9

pyhmmer-0.1.4-cp38-cp38-manylinux2010_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

pyhmmer-0.1.4-cp38-cp38-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.8

pyhmmer-0.1.4-cp37-cp37m-manylinux2010_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

pyhmmer-0.1.4-cp37-cp37m-manylinux1_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.7m

pyhmmer-0.1.4-cp36-cp36m-manylinux2010_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

pyhmmer-0.1.4-cp36-cp36m-manylinux1_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.6m

File details

Details for the file pyhmmer-0.1.4.tar.gz.

File metadata

  • Download URL: pyhmmer-0.1.4.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4.tar.gz
Algorithm Hash digest
SHA256 12493c0e7517370a8ddc74296d5cdeee8d14bade2e3bc0de6a348bf2bba45af4
MD5 13ed939677800a5389e2882c9eaeea0b
BLAKE2b-256 45ba8dfc16e6686811faee9e391925f6fbe071bac3703ff5ddde46d9dcdb5fe4

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 c1513f3d35505690f2f91b87a655e108a2e886af385a3da955d9409681badd1c
MD5 94960a7630badba4d8c80279c8004faa
BLAKE2b-256 090efd51fc9d1f92122847d889837da1405d30037fc7efc2d2f4c60f429af74e

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cc965583c26a755870b57f862b9a503188c3f13da02947285cdb413cada3d6fa
MD5 931b242704d5a9f31070bded0574cced
BLAKE2b-256 9ac7d64d4f925b439ed643625fe7dc7c3d18978b8e44f52e102fc2a3f19ce93d

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a7b4a66e460b7994d076bd86516c1a7a66d3f76465fbb7f75590f5621a42e79a
MD5 81c188619c7b1e0461c713f7425723f5
BLAKE2b-256 54eed73473e9bcb58e6d06d72a90dd2637dd6278f7a9e024166a3b938b5092d7

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8ca9668791d2963ad775f42dbf43231fa1aa4c143cf6527fe798785c3caf4135
MD5 fd2abfc8a799c45c2d71436298374946
BLAKE2b-256 239ec183f502d8e642e21cab71861711ed41785bf71dc5a3cb9138c6d564725a

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b0d28e08154a5e980a1619eabc66bf11df58d8009e5728b0a9c1d6a01e337253
MD5 7cb65283779e0857a634953574e2dec8
BLAKE2b-256 5be4027df8f7a117f915ef46f05745ecf2a48d25310fa43062d5541049afbf97

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ea5c1dfd3120abd3bfdf5e37a56d4d0797a8313d84137f8d1a2a713abb6917df
MD5 e3b4b69c4b5ffecf2ca28d21ab270959
BLAKE2b-256 f7eb48c78a9ea863061fd5ab3c494e2edf914544f2b36d6ec47b58eec1b2d9ca

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2c07378c49f2aedc1e0bbdac4c30529f5d03fb392bc92e01819a3fe58b2f796d
MD5 fa6d985fead6607700c6d5d0fc737c20
BLAKE2b-256 a2a3379fa5cec24f38e8809809e1c68fd9e533a38ecde0754033b5032ca6b1d1

See more details on using hashes here.

File details

Details for the file pyhmmer-0.1.4-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.1.4-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.1.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fb8b40385b405d8a06d8bce7522a0daa08804400765f2e4ef1113c1500c43484
MD5 7c0730ca834c8448f7eb6aea102efe0d
BLAKE2b-256 bc83511ca7d5ed1450f65aa1d1e07477a5fef8192394bf96b9476e5194e51239

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page