Skip to main content

QC on various type of pacbio data

Project description

https://badge.fury.io/py/sequana-pacbio-qc.svg JOSS (journal of open source software) DOI https://github.com/sequana/pacbio_qc/actions/workflows/main.yml/badge.svg Python 3.11 | 3.12

This is the pacbio_qc pipeline from the Sequana project

Overview:

Quality control and analysis for PacBio long-read sequencing data (BAM files). Generates comprehensive statistics on read quality, length distribution, and GC content, with optional taxonomic classification.

Input:

BAM files from PacBio sequencers (raw subreads, CCS, or processed reads)

Output:

Per-sample HTML reports with interactive visualizations, quality metrics, and optional taxonomic classification; comprehensive summary report with all samples

Status:

production

Documentation:

This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation

Install via pip:

pip install sequana_pacbio_qc

Optional dependencies:

  • kraken2: For taxonomic classification (optional, disabled by default)

  • graphviz: For DAG visualization

  • apptainer: For containerized execution of tools

Quick Start

# Display help
sequana_pacbio_qc --help

# Create pipeline in current directory
sequana_pacbio_qc --input-directory /path/to/bam/files

# With optional Kraken taxonomy
sequana_pacbio_qc --input-directory /path/to/bam/files --do-kraken --kraken-databases /path/to/kraken/db

# Using apptainer containers
sequana_pacbio_qc --input-directory /path/to/bam/files --apptainer-prefix ~/containers

This creates a pacbio_qc directory containing the pipeline and configuration files.

Execution

Execute the pipeline:

cd pacbio_qc
bash pacbio_qc.sh

Or with custom Snakemake parameters:

snakemake -s pacbio_qc.rules -c config.yaml --cores 4 --stats stats.txt

Or use the sequanix graphical interface.

Configuration

The pipeline uses config.yaml to control:

  • Input data: BAM file directory and pattern matching

  • Kraken: Optional taxonomic database paths (disabled by default)

  • MultiQC: QC report options

  • Apptainer: Container image URLs (optional)

Pipeline Overview

https://raw.githubusercontent.com/sequana/pacbio_qc/master/sequana_pipelines/pacbio_qc/dag.png

Workflow Details

The pipeline performs the following analyses on PacBio BAM files:

  1. Quality Metrics: Computes read length statistics, GC content distribution, and signal-to-noise ratios

  2. Visualizations: Generates histograms and scatter plots for quality assessment

  3. Per-Sample Reports: Creates individual HTML reports for each sample with:

    • Read length distribution histograms

    • GC content analysis

    • SNR (signal-to-noise ratio) metrics

    • Quality overview with sample statistics

  4. Taxonomy (Optional): Performs taxonomic classification using Kraken2 when enabled

  5. Summary Report: Generates a comprehensive HTML summary with:

    • Overview of pipeline and all samples

    • Summary statistics table with links to per-sample reports

    • MultiQC aggregated quality metrics

Note: Kraken2 databases are not provided with the pipeline. This step is optional and disabled by default.

Changelog

Version

Description

1.0.1

HTML reports with pipeline overview; race condition handling for parallel execution with –apptainer-prefix; improved CI/CD workflows

1.0.0

Uses latest wrappers and graphviz apptainers

0.11.0

Release to use latests sequana_pipetools framework

0.10.0

Update to use latest tools from sequana framework

0.9.0

First release of sequana_pacbio_qc using latest sequana rules and modules (0.9.5)

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Rules and configuration details

Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_pacbio_qc-1.1.0.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_pacbio_qc-1.1.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file sequana_pacbio_qc-1.1.0.tar.gz.

File metadata

  • Download URL: sequana_pacbio_qc-1.1.0.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.11.15 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_pacbio_qc-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d06442ac9e59b89438e7905153df6ac8f4fdcd333d363e5edd34d3dc0f307713
MD5 de8d9f22a85ebcde6f1a54555525d35b
BLAKE2b-256 fbec65d0b9c87173bd9210ac1b70ec696b462770f2242f2adb53b73f0efab9e9

See more details on using hashes here.

File details

Details for the file sequana_pacbio_qc-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: sequana_pacbio_qc-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.11.15 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_pacbio_qc-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a233498c438de3a72ee846d5b42059d4300bc3ef64b30802576d290a91b6982f
MD5 bd6c040e2065a92f201b0d13e8c0b920
BLAKE2b-256 06d50db13993ed41eadfd6497ab16445f7756fd93f64bd1d03712a8daec6f2a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page