sequana-coverage

Parallelise version of sequana_coverage standalone application.

These details have not been verified by PyPI

Project links

Homepage

Project description

This is is the coverage pipeline from the Sequana projet

Overview:

Parallelised version of sequana_coverage for large eukaryotes genome.

Input:

A set of BAM or BED files. BED file must have 3 or 4 columns. First column is the chromosome/contig name, second column stored positions and third the coverage. Fourth optional columns contains a filtered coverage (not used in the analysis but shown in the HTML reports)

Output:

a set of HTML reports for each chromosomes and a multiqc report

Status:

production

Citation:

Dimitri Desvillechabrol, Christiane Bouchier, Sean Kennedy, Thomas Cokelaer Sequana coverage: detection and characterization of genomic variations using running median and mixture models GigaScience, Volume 7, Issue 12, December 2018, giy110, https://doi.org/10.1093/gigascience/giy110

and

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352

Installation

You must install Sequana first:

pip install sequana

Then, just install this package:

pip install sequana_coverage

This gives an executable called sequana_pipelines_coverage. Note that is should not be confused with the original sequana_coverage standalone from Sequana library. Indeed, this pipeline calls sequana_coverage behund the scene.

Usage

sequana_pipelines_coverage --help
sequana_pipelines_coverage --input-directory DATAPATH

By default, this looks for BED file. WARNING. This are BED3 meaning a 3-columns tabulated file like this one:

chr1 1 10
chr1 2 11
...
chr1 N1 10
chr2 1 20
chr2 2 21
...
chr2 N2 20

where the first column stored the chromosome name, the second is the position and the third is the coverage itself. See sequana_coverage documentation for details. If you have BAM files as input, we will do the conversion for you. In such case, use this option:

--input-pattern "*.bam"

The sequana_pipelines_coverage script creates a directory with the pipeline and its configuration file. You will then need to execute the pipeline:

cd coverage
sh coverage.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:

snakemake -s coverage.rules -c config.yaml --cores 4 --stats stats.txt

Or use sequanix interface as follows:

sequanix -w analysis -i . -p coverage

Go to the second panel, in Input data and then in Input directory. There, you must modify the pattern (empty field by default meaning search for fastq files) and set the field to either:

*.bed

or:

*.bam

You are ready to go. Save the project and press Run. Once done, open the HTML report.

Requirements

This pipelines requires the following executable(s):

sequana_coverage from Sequana, which should be installed automatically.
multiqc

Details

This pipeline runs coverage in parallel on the input BAM files (or BED file).

The coverage tool takes as input a BAM or a BED file. The BED file must have 3 or 4 columns as explained in the standalone application (sequana_coverage) documentation. In short, the first column is the chromosome name, the second column is the position (sorted) and the third column is the coverage (an optional fourth column would contain a coverage signal, which could be high quality coverage for instance).

If you have only BAM files, you can convert them using bioconvert tool or the command:

samtools depth -aa input.bam > output.bed

If you have a CRAM file:

samtools view -@ 4 -T reference.fa -b -o out.bam  in.cram

For very large BAM/BED files, we recommend to split the BED file by chromosomes. For instance for the chromosome chr1, type:

# samtools index in.bam
samtools depth -aa input.bam -r chr1 in.bam > chr1.bed

The standalone or Snakemake application can also take as input your BAM file and will convert it automatically into a BED file.

Rules and configuration details

Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog

Version	Description
0.9.1	rename genbank field into annotation, window into window_size
0.9.0	first version

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.9.1

Jul 19, 2020

0.9.0

Dec 11, 2019

0.8.0

Dec 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_coverage-0.9.1.tar.gz (10.4 MB view details)

Uploaded Jul 19, 2020 Source

File details

Details for the file sequana_coverage-0.9.1.tar.gz.

File metadata

Download URL: sequana_coverage-0.9.1.tar.gz
Upload date: Jul 19, 2020
Size: 10.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200325 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.3

File hashes

Hashes for sequana_coverage-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`ffdcac58be754d6b2ee32b6f93ffea30c9c2890bd845d6a13f0596458f416a66`
MD5	`58515040c78d8b2a8011fda1286a2970`
BLAKE2b-256	`03a50a8b255a39c1731bf7868d60abcaa713da8f5a725abfde96ea8dac702e7a`

See more details on using hashes here.

sequana-coverage 0.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

Requirements

Details

Rules and configuration details

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes