A multi-sample variant calling pipeline
Project description
This is the variant_calling pipeline from the Sequana projet
- Overview:
Variant calling from FASTQ files
- Input:
FASTQ files from Illumina Sequencing instrument
- Output:
VCF and HTML files
- Status:
production
- Citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352
Installation
If you already have all requirements, you can install the packages using pip:
pip install sequana_variant_calling --upgrade
Otherwise, you can create a sequana_variant_calling conda environment executing:
conda env create -f environment.yml
and later activate the environment:
conda activate sequana_variant_calling
A third option is to install the pipeline with pip method (see above) and use singularity as explained afterwards.
Usage
sequana_variant_calling --help sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa
This creates a directory variant_calling. You just need to execute the pipeline:
cd variant_calling sh variant_calling.sh
This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:
snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt
Or use sequanix interface.
Usage with singularity::
With singularity, initiate the working directory as follows:
sequana_variant_calling --use-singularity
Images are downloaded in the working directory but you can store then in a directory globally (e.g.):
sequana_variant_calling --use-singularity --singularity-prefix ~/.sequana/apptainers
and then:
cd variant_calling sh variant_calling.sh
if you decide to use snakemake manually, do not forget to add singularity options:
snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt --use-singularity --singularity-prefix ~/.sequana/apptainers --singularity-args "-B /home:/home"
Requirements
This pipelines requires the following executable(s):
bwa
freebayes
picard (picard-tools)
sambamba
minimap2
samtools
snpEff you will need 5.0 or 5.1d (note the d); 5.1 does not work.
Details
Snakemake variant calling pipeline is based on tutorial written by Erik Garrison. Input reads (paired or single) are mapped using bwa and sorted with sambamba-sort. PCR duplicates are marked with sambamba-markdup. Freebayes is used to detect SNPs and short INDELs. The INDEL realignment and base quality recalibration are not necessary with Freebayes. For more information, please refer to a post by Brad Chapman on minimal BAM preprocessing methods.
The pipeline provides an analysis of the mapping coverage using sequana coverage. It detects and characterises automatically low and high genome coverage regions.
Detected variants are annotated with SnpEff if a GenBank file is provided. The pipeline does the database building automatically. Although most of the species should be handled automatically, some special cases such as particular codon table will required edition of the snpeff configuration file.
Finally, joint calling is also available and can be switch on if desired.
Changelog
Version |
Description |
---|---|
1.2.0 |
|
1.1.2 |
|
1.1.1 |
|
1.1.0 |
|
1.0.2 |
|
1.0.1 |
|
1.0.0 |
|
0.12.0 |
|
0.11.0 |
|
0.10.0 |
|
0.9.10 |
|
0.9.5 |
|
0.9.4 |
|
0.9.3 |
|
0.9.2 |
|
0.9.1 |
|
0.9.0 |
First release |
Contribute & Code of Conduct
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sequana_variant_calling-1.2.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47ba19db5d7709e8598f8fe84e5611a6850e8f1d0e97e8464dd85ca2978fc2c6 |
|
MD5 | f5bfee9a4c3c268855a02cc9b086f9ac |
|
BLAKE2b-256 | cb7c6b5e549e4ff2bccd6f900da34f76337f09b9125cbda9f230f8a6e138dc22 |
Hashes for sequana_variant_calling-1.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35d7d29a7e828c7af0646b63a8bf583fd9f675855d04a8d0058a01d4ff32dac8 |
|
MD5 | c03d347a8f8a0c381b35691155c034e5 |
|
BLAKE2b-256 | 3779fd9316345201561a7b138307875787d7f46f62ad8d49a64eced81d24d093 |