Skip to main content

A collection of scripts designed to process Kraken2 reports and convert them into CSV format.

Project description

KrakenParser: Convert Kraken2 Reports to CSV

Overview

KrakenParser is a collection of scripts designed to process Kraken2 reports and convert them into CSV format. This pipeline extracts taxonomic abundance data at six levels:

  • Phylum
  • Class
  • Order
  • Family
  • Genus
  • Species

You can run the entire pipeline with a single command, or use the scripts individually depending on your needs.

🔗 Please visit KrakenParser wiki page

Output example

Total abundance output

counts_phylum.csv parsed from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,Calditrichota,Caldisericota,Thermosulfidibacterota,Elusimicrobiota,Candidatus Fervidibacterota,Lentisphaerota,Kiritimatiellota,Vulcanimicrobiota,Thermodesulfobiota,Atribacterota,Dictyoglomota,Nitrospinota,Chrysiogenota,Coprothermobacterota,Aquificota,Thermotogota,Bdellovibrionota,Nitrospirota,Deferribacterota,Synergistota,Myxococcota,Acidobacteriota,Candidatus Bipolaricaulota,Candidatus Saccharibacteria,Candidatus Absconditabacteria,Fusobacteriota,Spirochaetota,Candidatus Omnitrophota,Chlamydiota,Verrucomicrobiota,Planctomycetota,Thermodesulfobacteriota,Campylobacterota,Candidatus Cloacimonadota,Fibrobacterota,Gemmatimonadota,Balneolota,Rhodothermota,Ignavibacteriota,Chlorobiota,Bacteroidota,Deinococcota,Thermomicrobiota,Armatimonadota,Chloroflexota,Cyanobacteriota,Mycoplasmatota,Actinomycetota,Bacillota,Pseudomonadota,Heterolobosea,Parabasalia,Fornicata,Evosea,Bacillariophyta,Cercozoa,Euglenozoa,Apicomplexa,Microsporidia,Basidiomycota,Ascomycota,Nanoarchaeota,Candidatus Micrarchaeota,Candidatus Thermoplasmatota,Candidatus Lokiarchaeota,Nitrososphaerota,Euryarchaeota,Thermoproteota,Hofneiviricota,Artverviricota,Nucleocytoviricota,Cossaviricota,Kitrinoviricota,Negarnaviricota,Lenarviricota,Pisuviricota,Peploviricota,Uroviricota
X1,0,0,0,0,0,0,0,0,1,1,1,1,2,3,4,5,7,8,9,17,23,25,5,13,22,47,54,1,6,27,31,128,151,2,6,13,1,3,7,44,14991,7,9,11,61,414,449,3551,55304,438645,0,0,0,0,0,0,1,22,0,4,15,0,0,0,0,0,3,191,0,0,1,88,0,0,0,161,0,1241
X2,1,4,14,20,5,12,15,6,8,15,2,15,109,68,182,97,79,196,70,272,331,149,36,77,35,562,1237,21,33,129,427,1044,543,8,98,25,16,45,11,1043,41374,160,28,161,1348,1196,2709,15864,431170,2747842,22,7,301,373,134,136,107,3239,54,1151,2905,0,0,3,5,6,7,410,0,0,0,736,0,3,11,26,1,1552
...
X8,1,19,0,47,0,1,6,20,28,0,1,1,47,7,336,110,30,32,10,93,85,48,9,7,7,154,386,0,14,19,106,358,242,14,5,134,15,11,7,18,54057,106,10,24,212,340,1128,16220,567908,650264,95,4,193,402,314,300,187,4376,37,9796,8653,0,1,0,1,5,23,1778,1,1,0,1,1,4,66,30,4,1263
X9,0,3,2,16,7,1,23,12,10,9,1,2,134,40,390,289,29,372,27,81,150,90,9,88,32,287,881,14,33,60,319,1045,328,15,22,22,10,72,8,63,35301,127,15,48,412,935,2343,11500,380765,2613854,0,0,0,0,0,0,5,74,0,38,40,3,0,0,0,1,3,275,0,0,0,0,0,2,118,25,0,1675

Relative abundance output

ra_phylum.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,taxon,rel_abund_perc
X1,Pseudomonadota,85.03558294577552
X1,Bacillota,10.72121619814011
X1,Other (<4.0%),4.243200856084384
X2,Pseudomonadota,84.28702055549813
X2,Bacillota,13.225663867469137
X2,Other (<4.0%),2.487315577032736
...
X8,Pseudomonadota,49.25373021277305
X8,Bacillota,43.01574040339849
X8,Bacteroidota,4.094504530639667
X8,Other (<4.0%),3.6360248531887933
X9,Pseudomonadota,85.62839981589192
X9,Bacillota,12.473649123439218
X9,Other (<4.0%),1.8979510606688494

α-diversity output

alpha_div.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample,Shannon,Pielou,Chao1
X1,3.911345447107001,0.5269245043289149,2274.533185840708
X2,3.9944130792536563,0.4906424221265042,4155.0
...
X8,3.442077115880119,0.42753293021330063,4177.251358695652
X9,4.033664950188261,0.5050385978575492,3492.16

β-diversity output

beta_div_bray.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9
X1,0.0,0.398,...,0.61,0.353
X2,0.398,0.0,...,0.723,0.388
...
X8,0.61,0.723,...,0.0,0.665
X9,0.353,0.388,...,0.665,0.0

beta_div_jaccard.csv calculated from 7 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9
X1,0.0,0.7073170731707317,...,0.8223938223938224,0.7232472324723247
X2,0.7073170731707317,0.0,...,0.835016835016835,0.7352941176470589
...
X8,0.8223938223938224,0.835016835016835,...,0.0,0.8066914498141264
X9,0.7232472324723247,0.7352941176470589,...,0.8066914498141264,0.0

Visualization examples gallery

Stacked Barplot Streamgraph
kpstbar kpstream
Stacked Barplot + Streamgraph Clustermap
combined_white kpclust

Quick Start (Full Pipeline)

To run the full pipeline, use the following command:

KrakenParser --complete -i data/kreports
#Having troubles? Run KrakenParser --complete -h

This will:

  1. Convert Kraken2 reports to MPA format
  2. Combine MPA files into a single file
  3. Extract taxonomic levels into separate text files
  4. Process extracted text files
  5. Convert them into CSV format
  6. Calculate relative abundance
  7. Calculate α & β-diversities

Input Requirements

  • The Kraken2 reports must be inside a subdirectory (e.g., data/kreports).
  • The script automatically creates output directories and processes the data.

Installation

pip install krakenparser

Using Individual Modules

You can also run each step manually if needed.

Step 1: Convert Kraken2 Reports to MPA Format

KrakenParser --kreport2mpa -i data/kreports -o data/mpa
#Having troubles? Run KrakenParser --kreport2mpa -h

This script converts Kraken2 .kreport files into MPA format using KrakenTools.

Step 2: Combine MPA Files

KrakenParser --combine_mpa -i data/mpa/* -o data/COMBINED.txt
#Having troubles? Run KrakenParser --combine_mpa -h

This merges multiple MPA files into a single combined file.

Step 3: Extract Taxonomic Levels

KrakenParser --deconstruct -i data/COMBINED.txt -o data/counts
#Having troubles? Run KrakenParser --deconstruct -h

If user wants to inspect Viruses domain separately:

KrakenParser --deconstruct_viruses -i data/COMBINED.txt -o data/counts_viruses
#Having troubles? Run KrakenParser --deconstruct_viruses -h

This step extracts only species-level data (excluding human reads).

Step 4: Process Extracted Taxonomic Data

KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_phylum.txt
#Having troubles? Run KrakenParser --process -h

Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --process to a loop!

This script cleans up taxonomic names (removes prefixes, replaces underscores with spaces).

Step 5: Convert TXT to CSV

KrakenParser --txt2csv -i data/counts/txt/counts_phylum.txt -o data/counts/csv/counts_phylum.csv
#Having troubles? Run KrakenParser --txt2csv -h

Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --txt2csv to a loop!

This converts the processed text files into structured CSV format.

Step 6: Calculate relative abundance

KrakenParser --relabund -i data/counts/csv/counts_phylum.csv -o data/counts/csv_relabund/counts_phylum.csv
#Having troubles? Run KrakenParser --relabund -h

Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --relabund to a loop!

This calculates relative abundance and saves as CSV format.

If user wants to group low abundant taxa in "Other" group:

KrakenParser --relabund -i data/counts/csv/counts_phylum.csv -o data/counts/csv_relabund/counts_phylum.csv --other 3.5
#Having troubles? Run KrakenParser --relabund -h

This will group all the taxa that have abundance <3.5 into "Other <3.5%" group. Other parameters are welcome!

Step 7: Calculate α & β-diversities

KrakenParser --diversity -i data/counts/csv/counts_species.csv -o data/diversity
#Having troubles? Run KrakenParser --diversity -h

This calculates α & β-diversities and saves them as CSV format to directory provided in the output.

If user wants to use another depth for β-diversity calculations:

KrakenParser --diversity -i data/counts/csv/counts_species.csv -o data/diversity --depth 750
#Having troubles? Run KrakenParser --diversity -h

Other parameters are welcome!

Arguments Breakdown

KrakenParser (Main Pipeline)

  • Automates the entire workflow.
  • Takes one argument: the path to Kraken2 reports (data/kreports).
  • Runs all the scripts in sequence.

--kreport2mpa (Step 1)

  • Converts Kraken2 reports to MPA format.
  • Uses KrakenTools/kreport2mpa.py.

--combine_mpa (Step 2)

  • Combines multiple MPA files into one.
  • Uses KrakenTools/combine_mpa.py.

--deconstruct & --deconstruct_viruses (Step 3)

  • Extracts phylum, class, order, family, genus, species into separate text files.
  • Removes human-related reads (--deconstruct only).

--process (Step 4)

  • Cleans and formats extracted taxonomic data.
  • Removes prefixes (s__, g__, etc.), replaces underscores with spaces.

--txt2csv (Step 5)

  • Converts cleaned text files to CSV.
  • Transposes data so that sample names become rows.

--relabund (Step 6)

  • Calculates relative abundance based on total abundance CSV.
  • Optionally can group low abundant taxa.

--diversity (Step 7)

  • Calculates α & β-diversities based on total species abundance CSV.
  • Shannon, Pielou & Chao1 indices for α-diversity
  • Bray-Curtis & Jaccard indices for β-diversity
  • Uses 1000 depth for β-diversity as default (can be adjusted with -d)

Example Output Structure

After running the full pipeline, the output directory will look like this:

data/
├─ kreports/               # Input Kraken2 reports
├─ mpa/                    # Converted MPA files
├─ COMBINED.txt            # Merged MPA file
├─ counts/
│  ├─ txt/                 # Extracted taxonomic levels in TXT
│  │  ├─ counts_species.txt
│  │  ├─ counts_genus.txt
│  │  ├─ counts_family.txt
│  │  ├─ ...
│  └─ csv/                 # Total abundance CSV output
│     ├─ counts_species.csv
│     ├─ counts_genus.csv
│     ├─ counts_family.csv
│     ├─ ...
├─ rel_abund/              # Relative abundance CSV output
│  ├─ ra_species.csv
│  ├─ ra_genus.csv
│  ├─ ra_family.csv
│  ├─ ...
└─ diversity/
   ├─ alpha_div.csv
   ├─ beta_div_bray.csv
   └─ beta_div_jaccard.csv

Conclusion

KrakenParser provides a simple and automated way to convert Kraken2 reports into usable CSV files for downstream analysis. You can run the full pipeline with a single command or use individual scripts as needed.

For any issues or feature requests, feel free to open an issue on GitHub!

🚀 Happy analyzing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krakenparser-0.6.1.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krakenparser-0.6.1-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file krakenparser-0.6.1.tar.gz.

File metadata

  • Download URL: krakenparser-0.6.1.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for krakenparser-0.6.1.tar.gz
Algorithm Hash digest
SHA256 4518335c4ef4aac3a6df87e5c6bdd8ac2753a0d656c2238ffc2c54c32d7e57f1
MD5 7eb9d7db844c481e8fece69bb7205176
BLAKE2b-256 118f9fcad210c4de772b69e43bf1c9bbb0882ad5e88d861a9b91f173caf68cad

See more details on using hashes here.

File details

Details for the file krakenparser-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: krakenparser-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for krakenparser-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b82252ba936b3fc6a8fe8e84b015574e336db105a1e6d14364b63044b741083c
MD5 ebb93f6ae2735d003c6654cd332a1ed3
BLAKE2b-256 55b85443f18d7a2baef5e42629f3dabd31e054e0b9da6eaa7ddd15ee2f152db3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page