Skip to main content

Create phylogenetic trees from metagenomic reports.

Project description

Gracken

Gracken is a tool for creating phylogenetic trees from Bracken/Kraken2 reports. Both NCBI and GTDB taxonomies (see Struo2) are supported. It accomplishes this by pruning the NCBI/GTDB-provided tree using the species information from the report files. Gracken outputs a Newick-formatted tree and an OTU file with a format suitable for use with phyloseq.

Gracken was inspired by this comment on the KrakenTools repository.

Installation

If you have uv installed, you can run gracken directly using uvx:

uvx gracken --help

or install it using either uv or pipx:

# with uv
uv tool install gracken

# with pipx
pipx install gracken

# test that it works
gracken --help

Usage

Gracken requires Bracken or Kraken2 report files. For GTDB taxonomy, you'll also need the taxonomy and tree files matching the database you used.

usage: gracken [-h] [--version] --input_dir INPUT_DIR [--bac_taxonomy BAC_TAXONOMY] [--ar_taxonomy AR_TAXONOMY]
               [--bac_tree BAC_TREE] [--ar_tree AR_TREE] [--out_prefix OUT_PREFIX] [--mode {bracken,kraken2}]
               [--keep-spaces] [--taxonomy {gtdb,ncbi}] [--full-taxonomy]

Creates a phylogenetic tree and OTU table from Bracken/Kraken2 reports by pruning GTDB/NCBI trees

options:
  -h, --help            show this help message and exit
  --version, -v         show program version and exit
  --input_dir INPUT_DIR, -i INPUT_DIR
                        directory containing Bracken/Kraken2 report files
  --bac_taxonomy BAC_TAXONOMY
                        path to GTDB bacterial taxonomy file
  --ar_taxonomy AR_TAXONOMY
                        path to GTDB archaeal taxonomy file
  --bac_tree BAC_TREE   path to GTDB bacterial tree file
  --ar_tree AR_TREE     path to GTDB archaeal tree file
  --out_prefix OUT_PREFIX, -o OUT_PREFIX
                        prefix for output files (default: output). Creates <prefix>.tree and <prefix>.otu.csv
  --mode {bracken,kraken2}, -m {bracken,kraken2}
                        input file format (default: bracken)
  --keep-spaces         keep spaces in species names (default: False)
  --taxonomy {gtdb,ncbi}, -t {gtdb,ncbi}
                        taxonomy source to use (default: ncbi).
  --full-taxonomy, -f   include full taxonomy info in OTU table (default: False)

GTDB Taxonomy Example

First, download the taxonomy and tree files for the GTDB release corresponding to your Bracken reports from the GTDB website. For example, if you used GTDB release 95, you would download these files: bac120_taxonomy_r95.tsv, ar122_taxonomy_r95.tsv, bac120_r95.tree, and ar122_r95.tree.

gracken --input_dir bracken_reports --bac_taxonomy bac120_taxonomy_r95.tsv --ar_taxonomy ar122_taxonomy_r95.tsv --bac_tree bac120_r95.tree --ar_tree ar122_r95.tree --out_prefix my_tree --taxonomy gtdb

This command will:

  1. Read Bracken reports from the bracken_reports directory.
  2. Use the specified GTDB taxonomy and tree files.
  3. Create two output files: my_tree.tree (the phylogenetic tree) and my_tree.otu.csv (the OTU table).

NCBI Taxonomy Example

In NCBI mode, the taxdump is automatically downloaded. Simply run:

gracken --input_dir bracken_reports --out_prefix my_ncbi_tree --mode bracken

This command will:

  1. Read Bracken reports from the bracken_reports directory.
  2. Automatically download the necessary NCBI taxdump.
  3. Create two output files: my_ncbi_tree.tree (the phylogenetic tree) and my_ncbi_tree.otu.csv (the OTU table).

Output Files

Gracken will output two files:

  • <prefix>.otu.csv: A CSV file containing the OTU table. The first column contains the species names, and the remaining columns contain the counts for each species in each sample (smple names are derived from report file names). If you used the --full-taxonomy option, the first columns will contain the full taxonomy for each species.

  • <prefix>.tree: A Newick-formatted tree file.

Phyloseq Example

You can use the output OTU file and tree with phyloseq and ape. Here's an example R script which works regardless of whether you used --full-taxonomy or not:

library(ape)
library(phyloseq)

# read the otu table and convert it to matrix, using the species column as row names
otu_tbl <- read.csv("output.otu.csv", stringsAsFactors = FALSE)
otu_mat <- as.matrix(otu_tbl[, (which(names(otu_tbl) == "species") + 1):ncol(otu_tbl)])
rownames(otu_mat) <- otu_tbl$species
otu_mat[is.na(otu_mat)] <- 0
otu_table_ps <- otu_table(otu_mat, taxa_are_rows = TRUE)

# load the tree
tree_ps <- phy_tree(read.tree('output.tree'))

# create the phyloseq object
ps <- phyloseq(otu_table_ps, tree_ps)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gracken-0.1.1.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gracken-0.1.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file gracken-0.1.1.tar.gz.

File metadata

  • Download URL: gracken-0.1.1.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gracken-0.1.1.tar.gz
Algorithm Hash digest
SHA256 078c9b12c7453692627e6a9878deb4d21dcc6c0b993474d01238c35b7ae14ef3
MD5 8a29d56db5d25feb2deb6b31c325f8cf
BLAKE2b-256 f0959b0a7a24f34299fe5f739cd2887ec8fc775f143b5d584e2af3cd54414b5b

See more details on using hashes here.

File details

Details for the file gracken-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gracken-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gracken-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca0791f5f77068b95ead092986c3fb537f5d56534c3abe9a81799237969e66c9
MD5 1d2621492c5df4cfa84f7c6c41e19970
BLAKE2b-256 7dd26e7a00bfce3a68f7ad4355bbfa39aa9079a9b1b87f3dd22e51ce01628ad0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page