Skip to main content

A collection of scripts designed to process Kraken2 reports and convert them into CSV format.

Reason this release was yanked:

Outdated

Project description

KrakenParser: Convert Kraken2 Reports to CSV

Overview

KrakenParser is a collection of scripts designed to process Kraken2 reports and convert them into CSV format. This pipeline extracts taxonomic abundance data at six levels:

  • Phylum
  • Class
  • Order
  • Family
  • Genus
  • Species

You can run the entire pipeline with a single command, or use the scripts individually depending on your needs.

Output example

counts_phylum.csv parsed from 7 kraken2 reports of metagenomic samples using KrakenParser:

Sample_id,Euryarchaeota,Euglenozoa,Parabasalia,Apicomplexa,Basidiomycota,Ascomycota,Acidobacteriota,Bdellovibrionota,Chlorobiota,Ignavibacteriota,Planctomycetota,Spirochaetota,Thermotogota,Fusobacteriota,Cyanobacteriota,Mycoplasmatota,Actinomycetota,Pseudomonadota,Bacteroidota,Deferribacterota,Campylobacterota,Thermodesulfobacteriota,Bacillota,Negarnaviricota,Nucleocytoviricota,Uroviricota,Peploviricota
X1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
X2,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,4,0,0,0,0
X3,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,4,0,0,0,0
X4,1313,0,0,0,0,4,0,0,0,0,0,1,2,2,1,3,3,17,33,4,5,4,112,0,0,0,0
X5,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0
X6,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,0,3,3,0,3,2,13,0,0,0,1
X7,20,1,1,5,1,9,1,6,1,7,1,13,1,3,9,4,10,139,519,0,8,2,81,1,3,1,0

This counts_phylum.csv is easy to visualize as Relative Abundance Barplot!

Quick Start (Full Pipeline)

To run the full pipeline, use the following command:

KrakenParser data/kreports

This will:

  1. Convert Kraken2 reports to MPA format
  2. Combine MPA files into a single file
  3. Extract taxonomic levels into separate text files
  4. Process extracted text files
  5. Convert them into CSV format

Input Requirements

  • The Kraken2 reports must be inside a subdirectory (e.g., data/kreports).
  • The script automatically creates output directories and processes the data.

Installation

pip install krakenparser

Using Individual Modules

You can also run each step manually if needed.

Step 1: Convert Kraken2 Reports to MPA Format

KrakenParser --kreport2mpa -i data/kreports -o data/mpa

This script converts Kraken2 .kreport files into MPA format using KrakenTools.

Step 2: Combine MPA Files

KrakenParser --combine_mpa -i data/mpa/* -o data/COMBINED.txt

This merges multiple MPA files into a single combined file.

Step 3: Extract Taxonomic Levels

KrakenParser --deconstruct -i data/COMBINED.txt -o data/counts

This step extracts only species-level data (excluding human reads).

Step 4: Process Extracted Taxonomic Data

KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_phylum.txt
KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_class.txt
KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_order.txt
KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_family.txt
KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_genus.txt
KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_species.txt

This script cleans up taxonomic names (removes prefixes, replaces underscores with spaces).

Step 5: Convert TXT to CSV

KrakenParser --txt2csv -i data/counts/txt/counts_phylum.txt -o data/counts/csv/counts_phylum.csv
KrakenParser --txt2csv -i data/counts/txt/counts_class.txt -o data/counts/csv/counts_class.csv
KrakenParser --txt2csv -i data/counts/txt/counts_order.txt -o data/counts/csv/counts_order.csv
KrakenParser --txt2csv -i data/counts/txt/counts_family.txt -o data/counts/csv/counts_family.csv
KrakenParser --txt2csv -i data/counts/txt/counts_genus.txt -o data/counts/csv/counts_genus.csv
KrakenParser --txt2csv -i data/counts/txt/counts_species.txt -o data/counts/csv/counts_species.csv

This converts the processed text files into structured CSV format.

Arguments Breakdown

KrakenParser (Main Pipeline)

  • Automates the entire workflow.
  • Takes one argument: the path to Kraken2 reports (data/kreports).
  • Runs all the scripts in sequence.

--kreport2mpa (Step 1)

  • Converts Kraken2 reports to MPA format.
  • Uses KrakenTools/kreport2mpa.py.

--combine_mpa (Step 2)

  • Combines multiple MPA files into one.
  • Uses KrakenTools/combine_mpa.py.

--deconstruct (Step 3)

  • Extracts phylum, class, order, family, genus, species into separate text files.
  • Removes human-related reads.

--process (Step 4)

  • Cleans and formats extracted taxonomic data.
  • Removes prefixes (s__, g__, etc.), replaces underscores with spaces.

--txt2csv (Step 5)

  • Converts cleaned text files to CSV.
  • Transposes data so that sample names become rows.

Example Output Structure

After running the full pipeline, the output directory will look like this:

data/
├─ kreports/           # Input Kraken2 reports
├─ mpa/                # Converted MPA files
├─ COMBINED.txt        # Merged MPA file
└─ counts/
   ├─ txt/             # Extracted taxonomic levels in TXT
   │  ├─ counts_species.txt
   │  ├─ counts_genus.txt
   │  ├─ counts_family.txt
   │  ├─ ...
   └─ csv/             # Final CSV output
      ├─ counts_species.csv
      ├─ counts_genus.csv
      ├─ counts_family.csv
      ├─ ...

Conclusion

KrakenParser provides a simple and automated way to convert Kraken2 reports into usable CSV files for downstream analysis. You can run the full pipeline with a single command or use individual scripts as needed.

For any issues or feature requests, feel free to open an issue on GitHub!

🚀 Happy analyzing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krakenparser-0.1.31.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

KrakenParser-0.1.31-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file krakenparser-0.1.31.tar.gz.

File metadata

  • Download URL: krakenparser-0.1.31.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for krakenparser-0.1.31.tar.gz
Algorithm Hash digest
SHA256 54b746a6c3bf71dacbeed94b124a19ee9c2c3a45f811723fd61c2620ddc7f149
MD5 85c0c16625b9d50d9e5ff4dc0652dfd3
BLAKE2b-256 3a58981c8c4cab146e8ef9e6abe337af5a271d613e9b10a0c44e5081b4a5ec3f

See more details on using hashes here.

File details

Details for the file KrakenParser-0.1.31-py3-none-any.whl.

File metadata

  • Download URL: KrakenParser-0.1.31-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for KrakenParser-0.1.31-py3-none-any.whl
Algorithm Hash digest
SHA256 0dbe290b0f348103489bed6e8ab3bfd4b28cd43d7bea9c7e65424535f26e9c9e
MD5 d60638d6b074ab036922276b5a3cb643
BLAKE2b-256 01f8d79f83258740295e684c3622f61d683b59be32680304ecf6d77c782b04a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page