Parallelised version of sequana_coverage standalone application
Project description
This is the multicov pipeline from the Sequana project
- Overview:
Parallelised version of sequana_coverage for multi-sample genomic coverage analysis and CNV detection
- Input:
A set of BED files (3 or 4 columns: chromosome, position, coverage, optional filtered coverage)
- Output:
Per-sample HTML coverage reports, a MultiQC report, and a summary.html with links to all reports
- Status:
Production
- Documentation:
This README file and https://sequana.readthedocs.io
- Citation:
Dimitri Desvillechabrol, Christiane Bouchier, Sean Kennedy, Thomas Cokelaer Sequana coverage: detection and characterization of genomic variations using running median and mixture models GigaScience, Volume 7, Issue 12, December 2018, giy110, https://doi.org/10.1093/gigascience/giy110
and
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352
Installation
If you already have all requirements, install the package using pip:
pip install sequana_multicov --upgrade
Usage
Scan BED files in a directory and set up the pipeline (replace DATAPATH with your input directory):
sequana_multicov --input-directory DATAPATH
To provide a reference FASTA file for GC content plots:
sequana_multicov --input-directory DATAPATH --reference-file genome.fa
To provide a GenBank annotation file for event annotation:
sequana_multicov --input-directory DATAPATH --annotation-file genome.gbk
This creates a multicov/ directory with the pipeline and configuration file. Execute the pipeline locally:
cd multicov sh multicov.sh
If you are familiar with Snakemake, you can also run the pipeline directly:
snakemake -s multicov.rules --cores 4 --stats stats.txt
See .sequana/profile/config.yaml to tune Snakemake behaviour (cores, cluster settings, etc.).
Usage with apptainer
With apptainer, initiate the working directory as follows:
sequana_multicov --input-directory DATAPATH --use-apptainer
Images are downloaded in the working directory but you can store them in a shared location:
sequana_multicov --input-directory DATAPATH --use-apptainer --apptainer-prefix ~/.sequana/apptainers
and then:
cd multicov sh multicov.sh
Input format
BED files must have 3 or 4 tab-separated columns:
chr1 1 10 chr1 2 11 ... chr2 1 20 chr2 2 21 ...
where the first column is the chromosome/contig name, the second is the position (1-based, sorted), and the third is the coverage depth. An optional fourth column may contain a filtered coverage signal (shown in reports but not used in the analysis).
If you only have BAM files, convert them with:
samtools depth -aa input.bam > output.bed
For a specific chromosome only:
samtools depth -aa -r chr1 input.bam > chr1.bed
For CRAM files, convert to BAM first:
samtools view -@ 4 -T reference.fa -b -o out.bam in.cram
Requirements
This pipeline requires the following executables:
sequana_coverage — from the Sequana package (installed automatically)
multiqc — aggregated HTML report across samples
Install all dependencies at once:
mamba env create -f environment.yml
Details
This pipeline runs sequana_coverage in parallel across all input BED files. For each sample it produces a standalone HTML report with:
coverage plots and running-median normalisation
ROI (region of interest) detection using z-score thresholds
CNV clustering
GC content overlay (when a reference FASTA is provided)
Event annotation (when a GenBank file is provided)
On success, a summary.html is generated listing all samples with direct links to their individual reports, plus a MultiQC report aggregating key statistics across samples.
For very large genomes the --binning and --chunksize options can be used to reduce memory usage.
Rules and configuration details
Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
Changelog
Version |
Description |
|---|---|
1.2.0 |
|
1.1.0 |
|
1.0.0 |
|
0.9.1 |
|
0.9.0 |
|
Contribute & Code of Conduct
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequana_multicov-1.2.0.tar.gz.
File metadata
- Download URL: sequana_multicov-1.2.0.tar.gz
- Upload date:
- Size: 54.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4912565fd48fb7c382d09f86ee3cd5312f3a3380532be1c6b252dea2d62c31b
|
|
| MD5 |
e6d14c70e9f02fcf1b9007d3904e4711
|
|
| BLAKE2b-256 |
2af5dbff02862ebcb87fa713e23c0f60f9518ef4d10139af04d8648a0bbc3299
|
File details
Details for the file sequana_multicov-1.2.0-py3-none-any.whl.
File metadata
- Download URL: sequana_multicov-1.2.0-py3-none-any.whl
- Upload date:
- Size: 53.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03359dad7081830b144f4f30d9708f2625f5e4ad40f3f8a2644620a4770fa120
|
|
| MD5 |
4d468a28672020d6138f5ffb14356a4d
|
|
| BLAKE2b-256 |
0867b51abb71245354b7709e94d65892149dea00d4236be5c5ffa5d8e8a07e59
|