Skip to main content

A multi-sample variant calling pipeline

Project description

https://badge.fury.io/py/sequana-variant-calling.svg JOSS (journal of open source software) DOI https://github.com/sequana/variant_calling/actions/workflows/main.yml/badge.svg Python 3.8 | 3.9 | 3.10

This is the variant_calling pipeline from the Sequana projet

Overview:

Variant calling from FASTQ files

Input:

FASTQ files from Illumina Sequencing instrument

Output:

VCF and HTML files

Status:

production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352

Installation

If you already have all requirements, you can install the packages using pip:

pip install sequana_variant_calling --upgrade

Otherwise, you can create a sequana_variant_calling conda environment executing:

conda env create -f environment.yml

and later activate the environment:

conda activate sequana_variant_calling

A third option is to install the pipeline with pip method (see above) and use singularity as explained afterwards.

Usage

sequana_variant_calling --help
sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa

This creates a directory variant_calling. You just need to execute the pipeline:

cd variant_calling
sh variant_calling.sh

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:

snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt

Or use sequanix interface.

Usage with singularity::

With singularity, initiate the working directory as follows:

sequana_variant_calling --use-singularity

Images are downloaded in the working directory but you can store then in a directory globally (e.g.):

sequana_variant_calling --use-singularity --singularity-prefix ~/.sequana/apptainers

and then:

cd variant_calling
sh variant_calling.sh

if you decide to use snakemake manually, do not forget to add singularity options:

snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt --use-singularity --singularity-prefix ~/.sequana/apptainers --singularity-args "-B /home:/home"

Requirements

This pipelines requires the following executable(s):

  • bwa

  • freebayes

  • picard (picard-tools)

  • sambamba

  • minimap2

  • samtools

  • snpEff you will need 5.0 or 5.1d (note the d); 5.1 does not work.

https://raw.githubusercontent.com/sequana/sequana_variant_calling/main/sequana_pipelines/variant_calling/dag.png

Details

Snakemake variant calling pipeline is based on tutorial written by Erik Garrison. Input reads (paired or single) are mapped using bwa and sorted with sambamba-sort. PCR duplicates are marked with sambamba-markdup. Freebayes is used to detect SNPs and short INDELs. The INDEL realignment and base quality recalibration are not necessary with Freebayes. For more information, please refer to a post by Brad Chapman on minimal BAM preprocessing methods.

The pipeline provides an analysis of the mapping coverage using sequana coverage. It detects and characterises automatically low and high genome coverage regions.

Detected variants are annotated with SnpEff if a GenBank file is provided. The pipeline does the database building automatically. Although most of the species should be handled automatically, some special cases such as particular codon table will required edition of the snpeff configuration file.

Finally, joint calling is also available and can be switch on if desired.

Changelog

Version

Description

1.2.0

  • -Xmx8g option previously added is not robust. Does not work with snpEff 5.1 for instance.

  • add minimap aligner

  • add –nanopore and –pacbio to automatically set minimap2 as the aligner and the minimap options (map-pb or map-ont)

  • add minimap2 container.

  • add missing resources in snpeff section

1.1.2

  • add -Xmx8g option in snpeff rule at the build stage.

  • add resources (8G) in the snpeff rule at run stage

  • fix missing output_directory in sequana_coverage rule

  • fix joint calling (regression) input function and inputs

1.1.1

  • Fix regression in coverage rule

1.1.0

  • add specific apptainer for freebayes (v1.2.0)

  • Update API to use click

1.0.2

  • Fixed failure in multiqc if coverage and snpeff are off

1.0.1

  • automatically fill the bwa index algorithm and fix bwa_index rule to use the options in the config file (not the harcoded one)

1.0.0

  • use last warppers and graphviz apptainer

0.12.0

  • set all apptainers containers and add vcf to bcf conversions

  • Update rule sambamba to use latest wrappers

0.11.0

  • Add singularity containers

0.10.0

  • fully integrated sequana wrappers and simplification of HTML reports

0.9.10

  • Uses new sequana_pipetools and wrappers

0.9.5

  • fix typo in the onsuccess and update sequana requirements to use most up-to-date snakemake rules

0.9.4

  • fix typo related to the reference-file option new name not changed everyhere in the pipeline.

0.9.3

  • use new framework (faster –help, –from-project option)

  • rename –reference into –reference-file and –annotation to –annotation-file

  • add custom summary page

  • add multiqc config file

0.9.2

  • snpeff output files are renamed sample.snpeff (instead of samplesnpeff)

  • add multiqc to show sequana_coverage and snpeff summary sections

  • cleanup onsuccess section

  • more options sanity checks and options (e.g.,

  • genbank_file renamed into annotation_file in the config

  • use –legacy in freebayes options

  • fix coverage section to use new sequana api

  • add the -do-coverage, –do-joint-calling options as well as –circular and –frebayes–ploidy

0.9.1

  • Fix input-readtag, which was not populated

0.9.0

First release

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_variant_calling-1.2.0.tar.gz (111.0 kB view details)

Uploaded Source

Built Distribution

sequana_variant_calling-1.2.0-py3-none-any.whl (109.2 kB view details)

Uploaded Python 3

File details

Details for the file sequana_variant_calling-1.2.0.tar.gz.

File metadata

File hashes

Hashes for sequana_variant_calling-1.2.0.tar.gz
Algorithm Hash digest
SHA256 47ba19db5d7709e8598f8fe84e5611a6850e8f1d0e97e8464dd85ca2978fc2c6
MD5 f5bfee9a4c3c268855a02cc9b086f9ac
BLAKE2b-256 cb7c6b5e549e4ff2bccd6f900da34f76337f09b9125cbda9f230f8a6e138dc22

See more details on using hashes here.

File details

Details for the file sequana_variant_calling-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sequana_variant_calling-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35d7d29a7e828c7af0646b63a8bf583fd9f675855d04a8d0058a01d4ff32dac8
MD5 c03d347a8f8a0c381b35691155c034e5
BLAKE2b-256 3779fd9316345201561a7b138307875787d7f46f62ad8d49a64eced81d24d093

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page