Skip to main content

Cython bindings and Python interface to HMMER3.

Project description

🐍🟡♦️🟦 pyHMMER Stars

Cython bindings and Python interface to HMMER3.

GitLabCI Coverage PyPI Bioconda Wheel Python Versions Python Implementations License Source Mirror GitHub issues Docs Changelog Downloads DOI

🗺️ Overview

HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is maintained by members of the the Eddy/Rivas Laboratory at Harvard University.

pyhmmer is a Python module, implemented using the Cython language, that provides bindings to HMMER3. It directly interacts with the HMMER internals, which has the following advantages over CLI wrappers (like hmmer-py):

  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyhmmer as a dependency to your project, and stop worrying about the HMMER binaries being properly setup on the end-user machine.
  • no intermediate files: Everything happens in memory, in Python objects you have control on, making it easier to pass your inputs to HMMER without needing to write them to a temporary file. Output retrieval is also done in memory, via instances of the pyhmmer.plan7.TopHits class.
  • no input formatting: The Easel object model is exposed in the pyhmmer.easel module, and you have the possibility to build a Sequence object yourself to pass to the HMMER pipeline. This is useful if your sequences are already loaded in memory, for instance because you obtained them from another Python library (such as Pyrodigal or Biopython).
  • no output formatting: HMMER3 is notorious for its numerous output files and its fixed-width tabular output, which is hard to parse (even Bio.SearchIO.HmmerIO is struggling on some sequences).
  • efficient: Using pyhmmer to launch hmmsearch on sequences and HMMs in disk storage is typically faster than directly using the hmmsearch binary (see the Benchmarks section). pyhmmer.hmmsearch uses a different parallelisation strategy compared to the hmmsearch binary from HMMER, which helps getting the most of multiple CPUs.

This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to run biological analyses involving hmmsearch or phmmer.

🔧 Installing

pyhmmer can be installed from PyPI, which hosts some pre-built CPython wheels for x86-64 Linux, as well as the code required to compile from source with Cython:

$ pip install pyhmmer

Compilation for UNIX PowerPC is not tested in CI, but should work out of the box. Other architectures (e.g. Arm) and OSes (e.g. Windows) are not supported by HMMER.

A Bioconda package is also available, but only for Linux:

$ conda install -c bioconda pyhmmer

📖 Documentation

A complete API reference can be found in the online documentation, or directly from the command line using pydoc:

$ pydoc pyhmmer.easel
$ pydoc pyhmmer.plan7

💡 Example

Use pyhmmer to run hmmsearch, and obtain an iterable over TopHits that can be used for further sorting/querying in Python:

import pyhmmer

with pyhmmer.easel.SequenceFile("938293.PRJEB85.HG003687.faa") as file:
    alphabet = file.guess_alphabet()
    sequences = [seq.digitize(alphabet) for seq in file]

with pyhmmer.plan7.HMMFile("Pfam.hmm") as hmms:
    all_hits = list(pyhmmer.hmmsearch(hmms, sequences_file, cpus=4))

Processing happens in parallel using Python threads, and a TopHits object is yielded for every HMM passed in the input iterable.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⏱️ Benchmarks

Benchmarks were run on a i7-10710U CPU running 1.10GHz with 6 physical / 12 logical cores, using a FASTA file containing 2100 protein sequences extracted from the genome of Anaerococcus provencensis (938293.PRJEB85.HG003687.faa) and the version 33.1 of the Pfam HMM library containing 18,259 domains. Commands were run 4 times on a warm SSD. Plain lines show the times for pressed HMMs, and dashed-lines the times for HMMs in text format.

Benchmarks

Raw numbers can be found in the benches folder. They suggest that phmmer should be run with the number of logical cores, while hmmsearch should be run with the number of physical cores (or less). A possible explanation for this observation would be that HMMER platform-specific code requires too many SIMD registers per thread to benefit from simultaneous multi-threading.

🔍 See Also

If despite of all the advantages listed earlier, you would rather use HMMER through its CLI, this package will not be of great help. You should then check the hmmer-py package developed by Danilo Horta at the EMBL-EBI.

⚖️ License

This library is provided under the MIT License. The HMMER3 and Easel code is available under the BSD 3-clause license. See vendor/hmmer/LICENSE and vendor/easel/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original HMMER authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhmmer-0.2.1.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyhmmer-0.2.1-cp39-cp39-manylinux2010_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

pyhmmer-0.2.1-cp39-cp39-manylinux1_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.9

pyhmmer-0.2.1-cp38-cp38-manylinux2010_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

pyhmmer-0.2.1-cp38-cp38-manylinux1_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8

pyhmmer-0.2.1-cp37-cp37m-manylinux2010_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

pyhmmer-0.2.1-cp37-cp37m-manylinux1_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.7m

pyhmmer-0.2.1-cp36-cp36m-manylinux2010_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

pyhmmer-0.2.1-cp36-cp36m-manylinux1_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.6m

File details

Details for the file pyhmmer-0.2.1.tar.gz.

File metadata

  • Download URL: pyhmmer-0.2.1.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1.tar.gz
Algorithm Hash digest
SHA256 676e79bdd525a25046d3bee12affbbd379a81b89193657cac1f6449cbc064d2d
MD5 01cdceed6986afcfe6a9801c789a2641
BLAKE2b-256 c846fffe6de4f62fff6e0f9a75f7b9bd0b2ce227acceb74e5410953b40f56663

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 22f18450a837e206b8346b206a86337dbe13d11de906a659b146ff6e22630c93
MD5 c9eff19557ed478b3292ba6c3b311ded
BLAKE2b-256 864e019bdee89691d26bb66acba19b354cf88270396db6000362069b0d2f86b8

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2a54ae3d76e030fa5016c87bd18c1ebaf166a446a944bc7894d7dc058239cae3
MD5 29950a6d4aaa1f26870e476b66b840f1
BLAKE2b-256 c64a1f95ca9c7f09f3a37c25e1096431c711f250902c2d4755d7bc3cf0ae200c

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7658431b9114e26dc5ffdf6377b1b93e86dfa39a464df054aca3fe7ac957138e
MD5 f309940b9f9db1b3cb2e1dcd70b0fe69
BLAKE2b-256 c533388b9ab97e2e203b771ee35c840220aea998d326bf0101e00769101253bc

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 59a12e8631fac56ff7da3bcc24583a6961f32d42b2d72644d27561b7175f7615
MD5 0937b3a6b4120ad42c5cf50b94fcde38
BLAKE2b-256 903d7ea26b8136b23b600c9269c60cffc119889cbdc5bed174485cf864517875

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b42d8883707dc9deb40192b9e8f7faa8d14fdf678190d878e2bc862723c273a8
MD5 cde87a6ad223b0e977967b8195fafe29
BLAKE2b-256 ee086e9cd70c963df09e626a0aa4ca3e0db88cf06aac837521fdaf4105f2a4cc

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cc1eea8ab080f06f9599d00ebf7bdbd256232211841983dd151e106aa504c249
MD5 77cc35fe5fdc2ae357a83faa7cd691f3
BLAKE2b-256 497cd3f741a6021ad34a0330c4b025dcc5b6a149e453d19b1a6955e1a6840e69

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.9 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 aa4e462b35bebd4d4139ec3ff2f4861b2fc9212ddbdd5f292924a9179c705cca
MD5 f71756513f5cbe75180e22f08e2f4878
BLAKE2b-256 9143c2a90dd0e06d7dc09d952d80410751b02282b4ca4280fde3bde442795ce0

See more details on using hashes here.

File details

Details for the file pyhmmer-0.2.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyhmmer-0.2.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.9 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for pyhmmer-0.2.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f5d27aa2adc0c632bfc06af97d8ba2934c7fdff3cf224e12f2f29d56b18ea872
MD5 4e97df12d075a982654f8edc153216ab
BLAKE2b-256 3eedb6538d599c328f880b949b57c432be2701ba5a6f1c032eb055d3acafd32b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page