Skip to main content

A collection of scripts designed to process Kraken2 reports and convert them into CSV format.

Project description

KrakenParser: Convert Kraken2 Reports to CSV

License CI codecov

Overview

KrakenParser is a collection of scripts designed to process Kraken2 reports and convert them into CSV format. This pipeline extracts taxonomic abundance data at six levels:

  • Phylum
  • Class
  • Order
  • Family
  • Genus
  • Species

You can run the entire pipeline with a single command, or use the scripts individually depending on your needs.

🔗 Please visit KrakenParser wiki page

Output example

Total abundance output

counts_phylum.csv parsed from 9 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,Calditrichota,Caldisericota,Thermosulfidibacterota,Elusimicrobiota,Candidatus Fervidibacterota,Lentisphaerota,Kiritimatiellota,Vulcanimicrobiota,Thermodesulfobiota,Atribacterota,Dictyoglomota,Nitrospinota,Chrysiogenota,Coprothermobacterota,Aquificota,Thermotogota,Bdellovibrionota,Nitrospirota,Deferribacterota,Synergistota,Myxococcota,Acidobacteriota,Candidatus Bipolaricaulota,Candidatus Saccharibacteria,Candidatus Absconditabacteria,Fusobacteriota,Spirochaetota,Candidatus Omnitrophota,Chlamydiota,Verrucomicrobiota,Planctomycetota,Thermodesulfobacteriota,Campylobacterota,Candidatus Cloacimonadota,Fibrobacterota,Gemmatimonadota,Balneolota,Rhodothermota,Ignavibacteriota,Chlorobiota,Bacteroidota,Deinococcota,Thermomicrobiota,Armatimonadota,Chloroflexota,Cyanobacteriota,Mycoplasmatota,Actinomycetota,Bacillota,Pseudomonadota,Heterolobosea,Parabasalia,Fornicata,Evosea,Bacillariophyta,Cercozoa,Euglenozoa,Apicomplexa,Microsporidia,Basidiomycota,Ascomycota,Nanoarchaeota,Candidatus Micrarchaeota,Candidatus Thermoplasmatota,Candidatus Lokiarchaeota,Nitrososphaerota,Euryarchaeota,Thermoproteota,Hofneiviricota,Artverviricota,Nucleocytoviricota,Cossaviricota,Kitrinoviricota,Negarnaviricota,Lenarviricota,Pisuviricota,Peploviricota,Uroviricota
X1,0,0,0,0,0,0,0,0,1,1,1,1,2,3,4,5,7,8,9,17,23,25,5,13,22,47,54,1,6,27,31,128,151,2,6,13,1,3,7,44,14991,7,9,11,61,414,449,3551,55304,438645,0,0,0,0,0,0,1,22,0,4,15,0,0,0,0,0,3,191,0,0,1,88,0,0,0,161,0,1241
X2,1,4,14,20,5,12,15,6,8,15,2,15,109,68,182,97,79,196,70,272,331,149,36,77,35,562,1237,21,33,129,427,1044,543,8,98,25,16,45,11,1043,41374,160,28,161,1348,1196,2709,15864,431170,2747842,22,7,301,373,134,136,107,3239,54,1151,2905,0,0,3,5,6,7,410,0,0,0,736,0,3,11,26,1,1552
...
X8,1,19,0,47,0,1,6,20,28,0,1,1,47,7,336,110,30,32,10,93,85,48,9,7,7,154,386,0,14,19,106,358,242,14,5,134,15,11,7,18,54057,106,10,24,212,340,1128,16220,567908,650264,95,4,193,402,314,300,187,4376,37,9796,8653,0,1,0,1,5,23,1778,1,1,0,1,1,4,66,30,4,1263
X9,0,3,2,16,7,1,23,12,10,9,1,2,134,40,390,289,29,372,27,81,150,90,9,88,32,287,881,14,33,60,319,1045,328,15,22,22,10,72,8,63,35301,127,15,48,412,935,2343,11500,380765,2613854,0,0,0,0,0,0,5,74,0,38,40,3,0,0,0,1,3,275,0,0,0,0,0,2,118,25,0,1675

Relative abundance output

ra_phylum.csv calculated from 9 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,taxon,rel_abund_perc
X1,Pseudomonadota,85.03558294577552
X1,Bacillota,10.72121619814011
X1,Other (<4.0%),4.243200856084384
X2,Pseudomonadota,84.28702055549813
X2,Bacillota,13.225663867469137
X2,Other (<4.0%),2.487315577032736
...
X8,Pseudomonadota,49.25373021277305
X8,Bacillota,43.01574040339849
X8,Bacteroidota,4.094504530639667
X8,Other (<4.0%),3.6360248531887933
X9,Pseudomonadota,85.62839981589192
X9,Bacillota,12.473649123439218
X9,Other (<4.0%),1.8979510606688494

α-diversity output

alpha_div.csv calculated from 9 kraken2 reports of metagenomic samples using KrakenParser:

Sample,Shannon,Pielou,Chao1
X1,3.911345447107001,0.5269245043289149,2274.533185840708
X2,3.9944130792536563,0.4906424221265042,4155.0
...
X8,3.442077115880119,0.42753293021330063,4177.251358695652
X9,4.033664950188261,0.5050385978575492,3492.16

β-diversity output

beta_div_bray.csv calculated from 9 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9
X1,0.0,0.398,...,0.61,0.353
X2,0.398,0.0,...,0.723,0.388
...
X8,0.61,0.723,...,0.0,0.665
X9,0.353,0.388,...,0.665,0.0

beta_div_jaccard.csv calculated from 9 kraken2 reports of metagenomic samples using KrakenParser:

,X1,X2,...,X8,X9
X1,0.0,0.7073170731707317,...,0.8223938223938224,0.7232472324723247
X2,0.7073170731707317,0.0,...,0.835016835016835,0.7352941176470589
...
X8,0.8223938223938224,0.835016835016835,...,0.0,0.8066914498141264
X9,0.7232472324723247,0.7352941176470589,...,0.8066914498141264,0.0

Visualization examples gallery

Stacked Barplot Streamgraph
kpstbar kpstream
Stacked Barplot + Streamgraph Clustermap
combined_white kpclust

Quick Start (Full Pipeline)

To run the full pipeline, use the following command:

KrakenParser --complete -i data/kreports -o results/
#Having troubles? Run KrakenParser --complete -h

For reproducible β-diversity (rarefaction is stochastic by default):

KrakenParser -i data/kreports -o results/ -s 42

This will:

  1. Convert Kraken2 reports to MPA format
  2. Combine MPA files into a single file
  3. Extract taxonomic levels into separate text files
  4. Process extracted text files
  5. Convert them into CSV format
  6. Calculate relative abundance
  7. Calculate α & β-diversities

Installation

pip install krakenparser

Before Visualization: Grouping Low-Abundance Taxa

The full pipeline automatically calculates relative abundance. Before passing data to visualization, it is strongly recommended to re-run --relabund with the -O flag — this collapses all taxa below the chosen threshold into a single "Other" group, producing much cleaner and more readable plots.

KrakenParser --relabund -i data/counts/counts_species.csv -o data/rel_abund/ra_species.csv -O 4

This groups every taxon with relative abundance < 4 % into Other (<4.0%). Adjust the threshold to your data.

Note: The pipeline-generated rel_abund/ra_*.csv files (no -O) preserve the full unfiltered data — use them for statistical analysis. Use the -O variant specifically for visualization.


Using Individual Modules (Advanced)

Each step of the pipeline can also be run individually. This is useful for re-running a single step, debugging, or integrating KrakenParser into a custom workflow.

Step 1: Convert Kraken2 Reports to MPA Format

# Batch mode (directory)
KrakenParser --kreport2mpa -i data/kreports -o data/intermediate/mpa
# Single file
KrakenParser --kreport2mpa -r data/kreports/sample.kreport -o data/intermediate/mpa/sample.MPA.TXT
#Having troubles? Run KrakenParser --kreport2mpa -h

Converts Kraken2 .kreport files into MPA format.

Step 2: Combine MPA Files

KrakenParser --combine_mpa -i data/intermediate/mpa/* -o data/intermediate/COMBINED.txt
#Having troubles? Run KrakenParser --combine_mpa -h

Merges multiple MPA files into a single combined table.

Step 3: Extract Taxonomic Levels

KrakenParser --deconstruct -i data/intermediate/COMBINED.txt -o data/intermediate
#Having troubles? Run KrakenParser --deconstruct -h

By default, human-related taxa (Homo sapiens, Hominidae, Primates, Mammalia, Chordata) are removed. To keep them:

KrakenParser --deconstruct -i data/intermediate/COMBINED.txt -o data/intermediate --keep-human

To inspect the Viruses domain separately:

KrakenParser --deconstruct_viruses -i data/intermediate/COMBINED.txt -o data/counts_viruses
#Having troubles? Run KrakenParser --deconstruct_viruses -h

Step 4: Process Extracted Taxonomic Data

KrakenParser --process -i data/intermediate/COMBINED.txt -o data/intermediate/txt/counts_phylum.txt
#Having troubles? Run KrakenParser --process -h

Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --process in a loop.

Cleans up taxonomic names: removes prefixes (s__, g__, etc.) and replaces underscores with spaces.

Step 5: Convert TXT to CSV

KrakenParser --txt2csv -i data/intermediate/txt/counts_phylum.txt -o data/counts/counts_phylum.csv
#Having troubles? Run KrakenParser --txt2csv -h

Repeat on other 5 taxonomical levels or wrap in a loop. Transposes data so that sample names become rows.

Step 6: Calculate Relative Abundance

KrakenParser --relabund -i data/counts/counts_phylum.csv -o data/rel_abund/ra_phylum.csv
#Having troubles? Run KrakenParser --relabund -h

Repeat on other 5 taxonomical levels or wrap in a loop.

With "Other" grouping:

KrakenParser --relabund -i data/counts/counts_phylum.csv -o data/rel_abund/ra_phylum.csv -O 3.5

Groups all taxa with abundance < 3.5 % into Other (<3.5%).

Step 7: Calculate α & β-Diversities

KrakenParser --diversity -i data/counts/counts_species.csv -o data/diversity
#Having troubles? Run KrakenParser --diversity -h

With a custom rarefaction depth:

KrakenParser --diversity -i data/counts/counts_species.csv -o data/diversity -d 750

For reproducible results (rarefaction uses random subsampling — fix the seed to get the same matrix every run):

KrakenParser --diversity -i data/counts/counts_species.csv -o data/diversity -s 42

Arguments Breakdown

--complete (Full Pipeline)

  • Requires -i: path to the Kraken2 reports directory (e.g., data/kreports).
  • Optional -o: output directory (default: parent of -i).
  • Optional --keep-human: retain human-related taxa (default: filtered out).
  • Optional -s INT: random seed for reproducible β-diversity rarefaction (default: random).

--kreport2mpa (Step 1)

  • Batch mode: -i DIR -o DIR — converts all files in a directory.
  • Single-file mode: -r FILE -o FILE.

--combine_mpa (Step 2)

  • -i FILE [FILE ...]: one or more MPA files.
  • -o FILE: output merged table.

--deconstruct & --deconstruct_viruses (Step 3)

  • Extracts phylum, class, order, family, genus, species into separate text files.
  • --deconstruct removes human-related reads by default; use --keep-human to retain them.
  • --deconstruct_viruses extracts only the Viruses domain.

--process (Step 4)

  • Removes prefixes (s__, g__, etc.), replaces underscores with spaces.
  • -i: COMBINED.txt (source for sample-name header); -o: target txt file.

--txt2csv (Step 5)

  • Transposes a processed txt file into a CSV with sample names as rows.

--relabund (Step 6)

  • Calculates relative abundance from a total-counts CSV.
  • -O FLOAT: group taxa below FLOAT % into Other (<FLOAT%).

--diversity (Step 7)

  • Shannon, Pielou & Chao1 for α-diversity.
  • Bray-Curtis & Jaccard for β-diversity.
  • -d INT: rarefaction depth for β-diversity (default: 1000).
  • -s INT: random seed for reproducible rarefaction (default: random — results vary between runs).

Example Output Structure

After running the full pipeline, the output directory will look like this:

results/
├─ counts/                 # Total abundance CSV output
│  ├─ counts_species.csv
│  ├─ counts_genus.csv
│  ├─ ...
│  └─ counts_phylum.csv
├─ rel_abund/              # Relative abundance CSV output
│  ├─ ra_species.csv
│  ├─ ra_genus.csv
│  ├─ ...
│  └─ ra_phylum.csv
├─ diversity/              # Diversity metrics
│  ├─ alpha_div.csv
│  ├─ beta_div_bray.csv
│  └─ beta_div_jaccard.csv
└─ intermediate/           # Intermediate files
   ├─ mpa/                 # Converted MPA files
   │  ├─ {sample}.txt
   │  ├─ ...
   ├─ COMBINED.txt         # Merged MPA table
   └─ txt/                 # Extracted taxonomic levels in TXT
      ├─ counts_species.txt
      ├─ counts_genus.txt
      ├─ ...
      └─ counts_phylum.txt

Conclusion

KrakenParser provides a simple and automated way to convert Kraken2 reports into usable CSV files for downstream analysis. You can run the full pipeline with a single command or use individual scripts as needed.

For any issues or feature requests, feel free to open an issue on GitHub!

🚀 Happy analyzing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krakenparser-1.0.0.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krakenparser-1.0.0-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file krakenparser-1.0.0.tar.gz.

File metadata

  • Download URL: krakenparser-1.0.0.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for krakenparser-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9f9079bfc0bf5ea2d37c8e703aa3145d694462e428290f5ea1d8aad1873d3ed4
MD5 c2f4c6d09a2699486e47deb8573f7f8d
BLAKE2b-256 4bc50def2c085925cc86cdf681c0e4260616e5a664a76cdbf75365b6f460a359

See more details on using hashes here.

Provenance

The following attestation bundles were made for krakenparser-1.0.0.tar.gz:

Publisher: publish.yml on PopovIILab/KrakenParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file krakenparser-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: krakenparser-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for krakenparser-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f513b2a11ab2c552d3c1c7cf6f684e5dfc4580c20ad2444be9c469fb5f7e89e
MD5 5c9cf14a7554b9b32c8e1e3a3735dba3
BLAKE2b-256 56a73afe38d95c8cf0d947c7eabafcf5efa657c6d13f4c902b00400a33854c52

See more details on using hashes here.

Provenance

The following attestation bundles were made for krakenparser-1.0.0-py3-none-any.whl:

Publisher: publish.yml on PopovIILab/KrakenParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page