ProPhyle metagenomic classifier
Project description
Introduction
ProPhyle is a k-mer based metagenomic classifier using Burrows-Wheeler Transform. Its indexing strategy relies on a bottom-up propagation of k-mers in the tree, assembling contigs at each node, and matching using a standard full-text search using BWT-index. The analysis of shared k-mers between NGS reads and the genomes in the index determines which nodes are the best candidates for their classification. More information about the indexing scheme can be found in our poster.
Compared to other state-of-the-arts classifiers, ProPhyle provides several unique features:
Low memory requirements. Compared to Kraken, ProPhyle has 9x smaller memory footprint for index construction and 5x smaller footprint for querying.
Flexibility. ProPhyle is easy to use with any user-provided phylogenetic trees and reference genomes.
Standard bioinformatics formats. Newick/NHX is used for representing phylogenetic trees and SAM for reporting the assignments.
Lossless k-mer indexing. ProPhyle stores a list of all genomes containing a k-mer. It can be, therefore, accurate even with trees containing similar genomes (e.g, phylogenetic trees for a single species).
Deterministic behavior. ProPhyle is a fully deterministic classifier with a mathematically well-defined behavior.
For information about how to use ProPhyle, see the main ProPhyle documentation.
Quick example
Clone the ProPhyle repository and add it to PATH:
git clone --recursive http://github.com/karel-brinda/prophyle export PATH=$(pwd)/prophyle/prophyle:$PATH
Download the RefSeq bacterial database:
$ prophyle download bacteria
To quickly test ProPhyle functionality, create an index for randomly sampled 10% genomes from the E.coli subtree of NCBI taxonomy (with k=31):
$ prophyle index -s 0.1 ~/prophyle/bacteria.nw@561 _index_ecoli
Classify your reads:
$ prophyle classify _index_ecoli reads.fq > result.sam
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for prophyle-0.2.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7493d732a1b8cbc3995289d30497329dd0d6f6cd9af4c076c27738e36504dea7 |
|
MD5 | 0499c8f17d036d64b62717ea84bd56de |
|
BLAKE2b-256 | 1c562d90fbbb0fc96b6af11a734779abaeb2d52b55b3dd6dd606282d07f741ab |