Create phylogenetic trees from metagenomic reports.

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Gracken

Gracken is a tool for creating phylogenetic trees from Bracken/Kraken2 reports. Both NCBI and GTDB taxonomies (see Struo2) are supported. It accomplishes this by pruning the NCBI/GTDB-provided tree using the species information from the report files. Gracken outputs a Newick-formatted tree and an OTU file with a format suitable for use with phyloseq.

Gracken was inspired by this comment on the KrakenTools repository.

Installation

If you have uv installed, you can run gracken directly using uvx:

uvx gracken --help

or install it using either uv or pipx:

# with uv
uv tool install gracken

# with pipx
pipx install gracken

# test that it works
gracken --help

Usage

Gracken requires Bracken or Kraken2 report files. For GTDB taxonomy, you'll also need the taxonomy and tree files matching the database you used.

usage: gracken [-h] [--version] --input_dir INPUT_DIR [--bac_taxonomy BAC_TAXONOMY] [--ar_taxonomy AR_TAXONOMY]
               [--bac_tree BAC_TREE] [--ar_tree AR_TREE] [--out_prefix OUT_PREFIX] [--mode {bracken,kraken2}]
               [--keep-spaces] [--taxonomy {gtdb,ncbi}] [--full-taxonomy]

Creates a phylogenetic tree and OTU table from Bracken/Kraken2 reports by pruning GTDB/NCBI trees

options:
  -h, --help            show this help message and exit
  --version, -v         show program version and exit
  --input_dir INPUT_DIR, -i INPUT_DIR
                        directory containing Bracken/Kraken2 report files
  --bac_taxonomy BAC_TAXONOMY
                        path to GTDB bacterial taxonomy file
  --ar_taxonomy AR_TAXONOMY
                        path to GTDB archaeal taxonomy file
  --bac_tree BAC_TREE   path to GTDB bacterial tree file
  --ar_tree AR_TREE     path to GTDB archaeal tree file
  --out_prefix OUT_PREFIX, -o OUT_PREFIX
                        prefix for output files (default: output). Creates <prefix>.tree and <prefix>.otu.csv
  --mode {bracken,kraken2}, -m {bracken,kraken2}
                        input file format (default: bracken)
  --keep-spaces         keep spaces in species names (default: False)
  --taxonomy {gtdb,ncbi}, -t {gtdb,ncbi}
                        taxonomy source to use (default: ncbi).
  --full-taxonomy, -f   include full taxonomy info in OTU table (default: False)

GTDB Taxonomy Example

First, download the taxonomy and tree files for the GTDB release corresponding to your Bracken reports from the GTDB website. For example, if you used GTDB release 95, you would download these files: bac120_taxonomy_r95.tsv, ar122_taxonomy_r95.tsv, bac120_r95.tree, and ar122_r95.tree.

gracken --input_dir bracken_reports --bac_taxonomy bac120_taxonomy_r95.tsv --ar_taxonomy ar122_taxonomy_r95.tsv --bac_tree bac120_r95.tree --ar_tree ar122_r95.tree --out_prefix my_tree --taxonomy gtdb

This command will:

Read Bracken reports from the bracken_reports directory.
Use the specified GTDB taxonomy and tree files.
Create two output files: my_tree.tree (the phylogenetic tree) and my_tree.otu.csv (the OTU table).

NCBI Taxonomy Example

In NCBI mode, the taxdump is automatically downloaded. Simply run:

gracken --input_dir bracken_reports --out_prefix my_ncbi_tree --mode bracken

This command will:

Read Bracken reports from the bracken_reports directory.
Automatically download the necessary NCBI taxdump.
Create two output files: my_ncbi_tree.tree (the phylogenetic tree) and my_ncbi_tree.otu.csv (the OTU table).

Output Files

Gracken will output two files:

<prefix>.otu.csv: A CSV file containing the OTU table. The first column contains the species names, and the remaining columns contain the counts for each species in each sample (smple names are derived from report file names). If you used the --full-taxonomy option, the first columns will contain the full taxonomy for each species.
<prefix>.tree: A Newick-formatted tree file.

Phyloseq Example

You can use the output OTU file and tree with phyloseq and ape. Here's an example R script which works regardless of whether you used --full-taxonomy or not:

library(ape)
library(phyloseq)

# read the otu table and convert it to matrix, using the species column as row names
otu_tbl <- read.csv("output.otu.csv", stringsAsFactors = FALSE)
otu_mat <- as.matrix(otu_tbl[, (which(names(otu_tbl) == "species") + 1):ncol(otu_tbl)])
rownames(otu_mat) <- otu_tbl$species
otu_mat[is.na(otu_mat)] <- 0
otu_table_ps <- otu_table(otu_mat, taxa_are_rows = TRUE)

# load the tree
tree_ps <- phy_tree(read.tree('output.tree'))

# create the phyloseq object
ps <- phyloseq(otu_table_ps, tree_ps)

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.1.1

Mar 21, 2025

0.1.0

Feb 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gracken-0.1.1.tar.gz (9.6 kB view details)

Uploaded Mar 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gracken-0.1.1-py3-none-any.whl (9.1 kB view details)

Uploaded Mar 21, 2025 Python 3

File details

Details for the file gracken-0.1.1.tar.gz.

File metadata

Download URL: gracken-0.1.1.tar.gz
Upload date: Mar 21, 2025
Size: 9.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gracken-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`078c9b12c7453692627e6a9878deb4d21dcc6c0b993474d01238c35b7ae14ef3`
MD5	`8a29d56db5d25feb2deb6b31c325f8cf`
BLAKE2b-256	`f0959b0a7a24f34299fe5f739cd2887ec8fc775f143b5d584e2af3cd54414b5b`

See more details on using hashes here.

File details

Details for the file gracken-0.1.1-py3-none-any.whl.

File metadata

Download URL: gracken-0.1.1-py3-none-any.whl
Upload date: Mar 21, 2025
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gracken-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca0791f5f77068b95ead092986c3fb537f5d56534c3abe9a81799237969e66c9`
MD5	`1d2621492c5df4cfa84f7c6c41e19970`
BLAKE2b-256	`7dd26e7a00bfce3a68f7ad4355bbfa39aa9079a9b1b87f3dd22e51ce01628ad0`

See more details on using hashes here.

gracken 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Gracken

Installation

Usage

GTDB Taxonomy Example

NCBI Taxonomy Example

Output Files

Phyloseq Example

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes