Skip to main content

Parallelised version of sequana_coverage standalone application

Project description

https://badge.fury.io/py/sequana-multicov.svg https://github.com/sequana/multicov/actions/workflows/main.yml/badge.svg Python 3.8 | 3.9 | 3.10 JOSS (journal of open source software) DOI

This is the multicov pipeline from the Sequana project

Overview:

Parallelised version of sequana_coverage for multi-sample genomic coverage analysis and CNV detection

Input:

A set of BED files (3 or 4 columns: chromosome, position, coverage, optional filtered coverage)

Output:

Per-sample HTML coverage reports, a MultiQC report, and a summary.html with links to all reports

Status:

Production

Documentation:

This README file and https://sequana.readthedocs.io

Citation:

Dimitri Desvillechabrol, Christiane Bouchier, Sean Kennedy, Thomas Cokelaer Sequana coverage: detection and characterization of genomic variations using running median and mixture models GigaScience, Volume 7, Issue 12, December 2018, giy110, https://doi.org/10.1093/gigascience/giy110

and

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352

Installation

If you already have all requirements, install the package using pip:

pip install sequana_multicov --upgrade

Usage

Scan BED files in a directory and set up the pipeline (replace DATAPATH with your input directory):

sequana_multicov --input-directory DATAPATH

To provide a reference FASTA file for GC content plots:

sequana_multicov --input-directory DATAPATH --reference-file genome.fa

To provide a GenBank annotation file for event annotation:

sequana_multicov --input-directory DATAPATH --annotation-file genome.gbk

This creates a multicov/ directory with the pipeline and configuration file. Execute the pipeline locally:

cd multicov
sh multicov.sh

If you are familiar with Snakemake, you can also run the pipeline directly:

snakemake -s multicov.rules --cores 4 --stats stats.txt

See .sequana/profile/config.yaml to tune Snakemake behaviour (cores, cluster settings, etc.).

Usage with apptainer

With apptainer, initiate the working directory as follows:

sequana_multicov --input-directory DATAPATH --use-apptainer

Images are downloaded in the working directory but you can store them in a shared location:

sequana_multicov --input-directory DATAPATH --use-apptainer --apptainer-prefix ~/.sequana/apptainers

and then:

cd multicov
sh multicov.sh

Input format

BED files must have 3 or 4 tab-separated columns:

chr1    1    10
chr1    2    11
...
chr2    1    20
chr2    2    21
...

where the first column is the chromosome/contig name, the second is the position (1-based, sorted), and the third is the coverage depth. An optional fourth column may contain a filtered coverage signal (shown in reports but not used in the analysis).

If you only have BAM files, convert them with:

samtools depth -aa input.bam > output.bed

For a specific chromosome only:

samtools depth -aa -r chr1 input.bam > chr1.bed

For CRAM files, convert to BAM first:

samtools view -@ 4 -T reference.fa -b -o out.bam in.cram

Requirements

This pipeline requires the following executables:

  • sequana_coverage — from the Sequana package (installed automatically)

  • multiqc — aggregated HTML report across samples

Install all dependencies at once:

mamba env create -f environment.yml

Details

This pipeline runs sequana_coverage in parallel across all input BED files. For each sample it produces a standalone HTML report with:

  • coverage plots and running-median normalisation

  • ROI (region of interest) detection using z-score thresholds

  • CNV clustering

  • GC content overlay (when a reference FASTA is provided)

  • Event annotation (when a GenBank file is provided)

On success, a summary.html is generated listing all samples with direct links to their individual reports, plus a MultiQC report aggregating key statistics across samples.

For very large genomes the --binning and --chunksize options can be used to reduce memory usage.

Rules and configuration details

Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog

Version

Description

1.2.0

  • convert packaging from setup.py to pyproject.toml (Poetry)

  • add apptainer container for sequana_coverage rule

  • add summary.html report with sample count and per-sample links

1.1.0

  • set apptainer containers and use wrappers

1.0.0

  • renamed into multicov

  • update to use latest sequana_pipetools (v0.9.2)

0.9.1

  • rename genbank field into annotation, window into window_size

0.9.0

  • first version

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_multicov-1.2.0.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_multicov-1.2.0-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file sequana_multicov-1.2.0.tar.gz.

File metadata

  • Download URL: sequana_multicov-1.2.0.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_multicov-1.2.0.tar.gz
Algorithm Hash digest
SHA256 b4912565fd48fb7c382d09f86ee3cd5312f3a3380532be1c6b252dea2d62c31b
MD5 e6d14c70e9f02fcf1b9007d3904e4711
BLAKE2b-256 2af5dbff02862ebcb87fa713e23c0f60f9518ef4d10139af04d8648a0bbc3299

See more details on using hashes here.

File details

Details for the file sequana_multicov-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: sequana_multicov-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64

File hashes

Hashes for sequana_multicov-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03359dad7081830b144f4f30d9708f2625f5e4ad40f3f8a2644620a4770fa120
MD5 4d468a28672020d6138f5ffb14356a4d
BLAKE2b-256 0867b51abb71245354b7709e94d65892149dea00d4236be5c5ffa5d8e8a07e59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page