Skip to main content

A multi-sample and multi-databases taxonomic analysis using Kraken

Project description

https://badge.fury.io/py/sequana-multitax.svg JOSS (journal of open source software) DOI https://github.com/sequana/multitax/actions/workflows/main.yml/badge.svg Python 3.11 | 3.12

MULTITAX — Multi-database Taxonomic Classification pipeline

Overview:

Runs taxonomic analysis on a set of samples using sequana_taxonomy (Kraken2 under the hood), optionally followed by BLAST on unclassified reads.

Input:

A set of FastQ files (paired or single-end).

Output:

HTML report for each sample and a summary HTML report for all samples.

Status:

Production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, doi:10.21105/joss.00352

Pipeline DAG

Installation

pip install sequana-multitax

To upgrade an existing installation:

pip install sequana-multitax --upgrade

Quick Start

Step 1 — prepare the working directory:

sequana_multitax \
    --input-directory /path/to/reads \
    --databases /path/to/krakendb

This creates a multitax/ working directory containing config.yaml and a multitax.sh launch script.

Step 2 — review the configuration (optional but recommended):

cd multitax
cat config.yaml   # adjust parameters as needed

Step 3 — run the pipeline:

sh multitax.sh

Taxonomic database

You will need one or more Kraken2 databases. You can download a toy database for testing:

sequana_taxonomy --download toydb

The pipeline also requires a taxonomy file stored in ~/.config/sequana/taxonomy.dat. Download it once with:

sequana_multitax --update-taxonomy

Call this command again from time to time when unknown taxon IDs appear in the HTML reports.

Multiple databases can be passed to run iterative classification:

sequana_multitax \
    --input-directory /path/to/reads \
    --databases /path/to/virusdb /path/to/bacteriadb

Apptainer / Singularity

Every tool runs inside a pre-built container. Point --apptainer-prefix to a shared directory so images are downloaded once and reused across projects:

sequana_multitax \
    --input-directory /path/to/reads \
    --databases /path/to/krakendb \
    --apptainer-prefix ~/.sequana/apptainers

Pass extra bind mounts with --apptainer-args if your data lives outside $HOME:

--apptainer-args "-B /data:/data"

When running snakemake manually, include the apptainer options:

snakemake -s multitax.rules --configfile config.yaml --cores 4 \
    --use-apptainer \
    --apptainer-prefix ~/.sequana/apptainers \
    --apptainer-args "-B /home:/home"

HPC / SLURM cluster

On a cluster with SLURM, pass --profile slurm:

sequana_multitax \
    --input-directory /path/to/reads \
    --databases /path/to/krakendb \
    --profile slurm \
    --slurm-queue fast \
    --jobs 40 \
    --apptainer-prefix /shared/containers

BLAST on unclassified reads

Reads that remain unclassified after Kraken can optionally be BLASTed against a local database:

sequana_multitax \
    --input-directory /path/to/reads \
    --databases /path/to/krakendb \
    --store-unclassified \
    --do-blast-unclassified

This requires a local BLAST+ installation and a downloaded nt database.

Pipeline overview

  1. Kraken2 — classify reads against one or more databases sequentially.

  2. Krona — interactive pie charts per sample.

  3. [Optional] BLAST — align unclassified reads against a nucleotide DB.

  4. MultiQC — aggregated summary report across all samples.

Each sample produces an HTML report with a static pie chart (species distribution; grey = unclassified) that links to an interactive Krona chart.

Sample pie chart

When multiple databases are provided they are applied sequentially. The order matters: reads classified by the first database are removed before the second database is run.

Configuration file

After running sequana_multitax, a config.yaml is created in the working directory. Key sections:

  • sequana_taxonomy — databases, confidence threshold, store_unclassified

  • blast — enable/disable BLAST on unclassified reads

  • multiqc — aggregated report settings

Full reference: config.yaml

Requirements

  • kraken2

  • sequana_taxonomy

  • krona

Changelog

Version

Description

0.15.0

  • Fix Snakemake 8 compatibility (use_singularity → use_apptainer)

  • Fix typo in rules (rule → rules for blast_fasta reference)

  • Add missing keep_kraken_output key in default config

  • Remove orphan dendogram section from schema

  • Update dependencies (snakemake, pulp, pytest)

  • Fix apptainer CI workflow

  • Simplify __init__.py version detection

0.14.1

  • fix dict-style config assignment (use dot-notation on _Namespace)

  • update README to follow sequana pipeline conventions

0.14.0

  • updated container and sequana to fix issue with sequential analysis (several DBs)

0.13.0

  • new containerisaton

0.12.2

  • switch apptainer for sequana_taxonomy to the apptainer sequana

  • Fix version of sequana_wrappers to v23.12.5

  • add precommit

  • Fix dag to fix multiqc when using apptainers

0.12.1

  • update apptainers

0.12.0

  • Refactor to use new Click framework

0.11.1

  • add missing import in the main script

  • add wrapper version in config

0.11.0

  • use latest wrappers and graphivz apptainer

  • create and use a sequana-wrappers for the sequana_taxonomy rule

  • fix type when downloading taxonomy.dat

0.10.2

  • add singularity containers

0.10.1

  • fix blast run when no taxid is found and HTML report

0.10.0

  • uses new sequana wrappers and framework

  • add ability to run blast on unclassified reads

  • handle case of empty FastQ files

0.9.2

  • add –update-taxonomy DB option

  • add –store-unclassified option

0.9.1

  • fix a logger issue

0.9.0

  • fix plot summary dbs (sample names). Add options in schema+config file to tune the image if required.

  • HTML now includes links towards data that generates the top plots

  • fix case where zero sequences are found

  • check existence of input databases

  • add the –run argument

  • add multitax version in the header

  • add search box (Sequana feature) in the CSV tables

0.8.7

  • Update HTML report: fix the title of images. include table with DB proportion. Text to explain images and reports

0.8.6

  • A better report with new features from sequana.taxonomy

0.8.5

  • fix typo in doc, factorise multiqc rule

0.8.4

  • implement the –from-project option

0.8.3

  • add the confidence option in sequana_taxonomy rule

  • improve html report

  • uses new sequana framework to speed up –help calls

0.8.2

  • less stringent on requirements (mode warning)

  • fix input of the multiqc rule

0.8.1

Fix requirements.

0.8.0

First release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_multitax-0.15.0.tar.gz (150.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_multitax-0.15.0-py3-none-any.whl (150.7 kB view details)

Uploaded Python 3

File details

Details for the file sequana_multitax-0.15.0.tar.gz.

File metadata

  • Download URL: sequana_multitax-0.15.0.tar.gz
  • Upload date:
  • Size: 150.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sequana_multitax-0.15.0.tar.gz
Algorithm Hash digest
SHA256 92c231572dab59b9dd1573f834e2daa7cc17fa840a9ef8b2d5bda106fb8c2c22
MD5 de3536cc84cbe97f93d5364a5d296d5a
BLAKE2b-256 6c00fb37af5c634afe8d90ccc7330ad9ab10a1360a8850e799f088f36d835c63

See more details on using hashes here.

File details

Details for the file sequana_multitax-0.15.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sequana_multitax-0.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f0841aa8cec947b9032d0a1ca2c5f6163c9a9eba0e1a726b6096095787df80a
MD5 1b731c5885a70bdedacdf55fca3aba81
BLAKE2b-256 ae5916273a2e936396154a53505ab5482ee99baf32a4b5f2f975853f4cae9eec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page