Skip to main content

Phables: from fragmented assemblies to high-quality bacteriophage genomes

Project description

phables logo

Phables: from fragmented assemblies to high-quality bacteriophage genomes

CI GitHub DOI Code style: black

Anaconda-Server Badge PyPI version Anaconda-Server Badge CodeQL Documentation Status

Phables is a tool developed to resolve bacteriophage genomes using phage bubbles in viral metagenomic data. It models cyclic phage-like components in the viral metagenomic assembly as flow networks, models as a minimum flow decomposition problem and resolves genomic paths corresponding to flow paths determined. Phables uses the Minimum Flow Decomposition via Integer Linear Programming implementation to obtain the flow paths.

For detailed instructions on installation and usage, please refer to the documentation hosted at Read the Docs.

NEW: Phables is now available on bioconda at https://anaconda.org/bioconda/phables and on PyPI at https://pypi.org/project/phables/. Feel free to pick your package manager, but we recommend that you use conda.

Setting up Phables

Option 1: Installing Phables using conda (recommended)

You can install Phables from bioconda at https://anaconda.org/bioconda/phables. Make sure you have conda installed.

# create conda environment and install phables
conda create -n phables -c conda-forge -c anaconda -c bioconda phables

# activate environment
conda activate phables

Now you can go to Setting up Gurobi to configure Gurobi.

Option 2: Installing Phables using pip

You can install Phables from PyPI at https://pypi.org/project/phables/. Make sure you have pip and mamba installed.

pip install phables

Now you can go to Setting up Gurobi to configure Gurobi.

Setting up Gurobi

The MFD implementation uses the linear programming solver Gurobi. The phables conda environment and pip setup does not include Gurobi. You have to install Gurobi using one of the following commands depending on your package manager.

# conda
conda install -c gurobi gurobi

# pip
pip install gurobipy

To handle large models without any model size limitations, once you have installed Gurobi, you have to activate the (academic) license and add the key using the following command. You only have to do this once.

grbgetkey <KEY>

You can refer to further instructions at https://www.gurobi.com/academia/academic-program-and-licenses/.

Test the installation

After setting up, run the following command to print out the Phables help message.

phables --help

Quick Start Guide

Phables is powered by Snaketool which packs in all the setup, testing, preprocessing and running steps into an easy-to-use pipeline.

Setup the databases

# Download and setup the databases - you only have to do this once
phables install

Run on test data

phables test

Run on your own data

# Run Phables
# locally: using 8 threads (default is 1 thread)
phables run --input assembly_graph.gfa --reads fastq/ --threads 8

Please refer to the documentation hosted at Read the Docs for further information on how to run Phables.

Issues and Questions

Phables is still under testing. If you want to test (or break) Phables give it a try and report any issues and suggestions under Phables Issues.

If you come across any questions, please have a look at the Phables FAQ page. If your question is not here, feel free to post it under Phables Issues.

Contributing to Phables

Are you interested in contributing to the Phables project? If so, you can check out the contributing guidelines in CONTRIBUTING.md.

Acknowledgement

Phables uses the Gurobi implementation of MFD-ILP and code snippets from STRONG, METAMVGL, GraphBin, MetaCoAG and Hecatomb.

Citation

The Phables manuscript is currently under review and the preprint is available on biorxiv at DOI: 10.1101/2023.04.04.535632.

If you use Phables in your work, please cite Phables as,

Vijini Mallawaarachchi, Michael J. Roach, Bhavya Papudeshi, Sarah K Giles, Susanna R Grigson, Przemyslaw Decewicz, George Bouras, Ryan D Hesse, Laura K Inglis, Abbey LK Hutton, Elizabeth A Dinsdale, Robert A Edwards. Phables: from fragmented assemblies to high-quality bacteriophage genomes. bioRxiv 2023.04.04.535632; doi: https://doi.org/10.1101/2023.04.04.535632.

@article{Mallawaarachchi2023.04.04.535632,
	author = {Vijini Mallawaarachchi and Michael J Roach and Bhavya Papudeshi and Sarah K Giles and Susanna R Grigson and Przemyslaw Decewicz and George Bouras and Ryan D Hesse and Laura K Inglis and Abbey LK Hutton and Elizabeth A Dinsdale and Robert A Edwards},
	title = {Phables: from fragmented assemblies to high-quality bacteriophage genomes},
	elocation-id = {2023.04.04.535632},
	year = {2023},
	doi = {10.1101/2023.04.04.535632},
	publisher = {Cold Spring Harbor Laboratory},
	abstract = {Microbial communities found within the human gut have a strong influence on human health. Intestinal bacteria and viruses influence gastrointestinal diseases such as inflammatory bowel disease. Viruses infecting bacteria, known as bacteriophages, play a key role in modulating bacterial communities within the human gut. However, the identification and characterisation of novel bacteriophages remain a challenge. Available tools use similarities between sequences, nucleotide composition, and the presence of viral genes/proteins. Most available tools consider individual contigs to determine whether they are of viral origin. As a result of the challenges in viral assembly, fragmentation of viral genomes can occur, leading to the need for new approaches in viral identification. We introduce Phables, a new computational method to resolve bacteriophage genomes from fragmented viral metagenomic assemblies. Phables identifies bacteriophage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that over 80\% of the bacteriophage genomes resolved by Phables have high quality and are longer than the individual contigs identified by existing viral identification tools.},
	URL = {https://www.biorxiv.org/content/early/2023/04/04/2023.04.04.535632},
	eprint = {https://www.biorxiv.org/content/early/2023/04/04/2023.04.04.535632.full.pdf},
	journal = {bioRxiv}
}

Also, please cite the following tools/databases used by Phables.

  • Roach MJ, Pierce-Ward NT, Suchecki R, Mallawaarachchi V, Papudeshi B, et al. Ten simple rules and a template for creating workflows-as-applications. PLOS Computational Biology 18(12) (2022): e1010705. https://doi.org/10.1371/journal.pcbi.1010705
  • Terzian P, Olo Ndela E, Galiez C, Lossouarn J, Pérez Bucio RE, Mom R, Toussaint A, Petit MA, Enault F. PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR Genomics and Bioinformatics, Volume 3, Issue 3, lqab067 (2021). https://doi.org/10.1093/nargab/lqab067
  • Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017). https://doi.org/10.1038/nbt.3988
  • Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100 (2018). https://doi.org/10.1093/bioinformatics/bty191
  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools, Bioinformatics, Volume 25, Issue 16, Pages 2078–2079 (2009). https://doi.org/10.1093/bioinformatics/btp352
  • Hagberg AA, Schult DA, and Swart PJ. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11–15 (2008).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phables-1.0.0.tar.gz (2.7 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page