Skip to main content

Phage bioinformatics utilities (seqclust runner and friends).

Project description


Combating Phage Genomes


phu - Phage Utilities

phu (phage utilities) or phutilities, is a modular toolkit for viral genomics workflows. It provides command-line tools to handle common steps in phage bioinformatics pipelines—wrapping complex utilities behind a consistent and intuitive interface.

Installation

You can install phu using mamba or conda from the bioconda channel:

mamba create -n phu bioconda::phu

Usage

As a command-line tool, phu follows a modular structure. You can access different functionalities through subcommands. The general syntax is:

phu <command> [options]

Commands

  • screen: Screen contigs for specific protein families using HMMER on predicted coding sequences.
  • jack: Iteratively screen contigs from one or more seed proteins with jackhmmer and combine seeds hits.
  • cluster: Cluster viral sequences into species or other operational taxonomic units (OTUs).
  • simplify-taxa: Simplify vContact taxonomy prediction columns into compact lineage codes.

Cache Handling

phu caches predicted proteins for both screen and jack so repeated runs can reuse the same translated proteins when the prediction inputs have not changed. Search settings such as HMM files, seed markers, combine mode, and output folder do not affect the cache.

The cache is rebuilt when you change the contig input, --mode, --ttable, or the protein-length filter. For phu screen, that is --min-protein-len-aa. For phu jack, both --min-gene-len and --min-protein-len-aa participate in the cache key.

To remove previously cached predictions, run phu --clean-cache.

See the full cache guide in Cache Handling.

Contributing

We welcome contributions to phu! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bugfix.
  3. Make your changes and commit them.
  4. Submit a pull request describing your changes.

Developers

You can also install the development version of phu directly from GitHub:

git clone https://github.com/camilogarciabotero/phu.git
cd phu
pip install -e .

phu is also available on PyPI:

pip install phu

References

This program uses several key tools and libraries, make sure to acknowledge them when using phu:

  • vclust: A high-performance clustering tool for viral sequences:

Zielezinski A, Gudyś A, Barylski J, Siminski K, Rozwalak P, Dutilh BE, Deorowicz S. Ultrafast and accurate sequence alignment and clustering of viral genomes. Nat Methods. https://doi.org/10.1038/s41592-025-02701-7

  • seqkit: A toolkit for FASTA/Q file manipulation.

Wei Shen*, Botond Sipos, and Liuyang Zhao. 2024. SeqKit2: A Swiss Army Knife for Sequence and Alignment Processing. iMeta e191. doi:10.1002/imt2.191.

  • Prodigal: A gene prediction tool for prokaryotic genomes.

Hyatt, D., Chen, G. L., LoCascio, P. F., Land, M. L., Larimer, F. W., & Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics, 11(1), 119. https://doi.org/10.1186/1471-2105-11-119

  • pyrodigal: A tool for gene prediction in prokaryotic genomes.

Larralde, M., (2022). Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. Journal of Open Source Software, 7(72), 4296, https://doi.org/10.21105/joss.04296

  • HMMER: A suite of tools for sequence analysis using profile hidden Markov models.

Eddy, S. R. (2011). Accelerated Profile HMM Searches. PLoS Computational Biology, 7(10), e1002195. https://doi.org/10.1371/journal.pcbi.1002195

  • pyHMMER: Python bindings for HMMER.

Larralde, M., & Zeller, G. (2023). PyHMMER: a Python library binding to HMMER for efficient sequence analysis. Bioinformatics, 39(5). https://doi.org/10.1093/bioinformatics/btad214

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phu-0.7.0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phu-0.7.0-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file phu-0.7.0.tar.gz.

File metadata

  • Download URL: phu-0.7.0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phu-0.7.0.tar.gz
Algorithm Hash digest
SHA256 dd04a59dd1be409a15bb847fe3802627828ebb86df53dd8b0d5c7b7b53fadc40
MD5 63970925556faf8fb09fc4b2de99a5fa
BLAKE2b-256 7374e1b93c53f13c23d11c3e2001b3b1bc374e324dc830855f33eeb1479d5b6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for phu-0.7.0.tar.gz:

Publisher: python-publish.yml on camilogarciabotero/phu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phu-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: phu-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 31.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phu-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 983b67a425fcd8b159d051217245c59ad7f29d2900b9a8f2d02eb83b69dfe829
MD5 dbdbe3550a099dbbcec747190e32c6bd
BLAKE2b-256 1be281d8a7bc6ac09903222b09c892c5bd3699640d0adaa5bbc6b4eaba0caea7

See more details on using hashes here.

Provenance

The following attestation bundles were made for phu-0.7.0-py3-none-any.whl:

Publisher: python-publish.yml on camilogarciabotero/phu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page