An implementation of a whole exome analysis pipeline using the library Luigi for workflow management.
Project description
Wespipeline
An implementation of a whole exome analysis pipeline using Luigi <https://github.com/spotify/luigi/>
_ for workflow management.
.. figure:: https://raw.githubusercontent.com/janchorizo/wespipeline/master/docs/steps.png :alt: Steps Logo :align: center
This package provides with the implementation of tasks for executing partial or complete variant calling analysis with the advantages of having a workflow manager: dependency resolution, execution planner, modularity, monitoring and historic.
Documentation for the latest version is being hosted by readthedocs <https://wespipeline.readthedocs.io/en/latest/>
_
Installation
^^^^^^^^^^^^
Wespipeline is available through pip, conda and manual installation. Install it from the package repositories
pip3 install wespipeline
conda install wespipeline
, or download the project and place it in a place
accessible to Python.
Notice that executing the analysis will involve additional dependencies. These are cited below and can be downloaded with the Anaconda distribution:
-
Secuence retrieval : Sra Toolkit, Fastqc
-
Reference genome retrieval : No needed dependency
-
Secuence alignment : Bwa
-
Alignment processing : Bwa Samtools,
-
Variant calling : Freebayes, Varscan, Gatk, Deepvariant
-
Variant calling evaluation : Vcf tools
.. code-block:: bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
source $HOME/miniconda/bin/activate &&
conda config --add channels bioconda &&
conda config --add channels conda-forge &&
conda install -y samtools &&
conda install -y bwa &&
conda install -y picard &&
conda install -y platypus-variant &&
conda install -y varscan &&
conda install -y freebayes &&
conda install -y fastqc &&
conda install -y sra-tools &&
conda install -y vcftools
rm ~/miniconda.sh
Getting started ^^^^^^^^^^^^^^^
Installing or downloading the package will provide with a higher level task per step of the analysis, each of which can be executed in a similar fashion to other Luigi tasks.
Each of the six steps have a higher level task that can be scheduled in a similar fashion to other Luigi tasks:
.. code-block:: bash
python3 -m luigi --module wespipeline.<module> <Taskname> --<Taskname>-param value
Download the sequences using the NCBI accession number.
.. code-block:: bash
python3 -m luigi --module wespipeline.fastq FastqRetrieval \
--FastqRetrieval-paired-end true \
--FastqRetrieval-accession-number SRR9209557 \
--FastqRetrieval-create-report true
Or an external url.
.. code-block:: bash
python3 -m luigi --module wespipeline.fastq FastqRetrieval \
--FastqRetrieval-paired-end true \
--FastqRetrieval-compressed false \
--FastqRetrieval-accession-number SRR9209557 \
--FastqRetrieval-create-report true
Download the reference genome and create a report using FastqC.
.. code-block:: bash
python3.6 -m luigi --module tasks.reference ReferenceRetrieval
--workers 3 \
--ReferenceGenome-ref-url ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit \
--ReferenceGenome-from2bit True \
--GlobalParams-base-dir ./tfm_experiment \
--GlobalParams-log-dir .logs \
--GlobalParams-exp-name hg19
Or run the whole analysis, specifying the parameters for each of the steps.
.. code-block:: bash
python3 -m luigi --module tasks.vcf VariantCalling
--workers 3
--VariantCalling-use-platypus true
--VariantCalling-use-freebayes true
--VariantCalling-use-samtools false
--VariantCalling-use-gatk false
--VariantCalling-use-deepcalling false
--AlignProcessing-cpus 6
--FastqAlign-cpus 6
--FastqAlign-create-report True
--GetFastq-gz-compressed True
--GetFastq-fastq1-url ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R1_001.fastq.gz
--GetFastq-fastq2-url ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R2_001.fastq.gz
--GetFastq-from-ebi False
--GetFastq-paired-end True
--ReferenceGenomeRetrieval-ref-url ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit --ReferenceGenomeRetrieval-from2bit True
--GlobalParams-base-dir ./tfm_experiment
--GlobalParams-log-dir .logs
--GlobalParams-exp-name hg19
Tasks implemented ^^^^^^^^^^^^^^^^^
+-----------------+----------------------------+ | Module | Task | +=================+============================+ | reference | ReferenceGenomeRetrieval | +-----------------+----------------------------+ | fastq | FastqRetrieval | +-----------------+----------------------------+ | align | FastqAlignment | +-----------------+----------------------------+ | processalign | FastqProcessing | +-----------------+----------------------------+ | variantcalling | | VariantCalling | +-----------------+----------------------------+ | processalign | VariantProcessing | +-----------------+----------------------------+
Acknowledgements ^^^^^^^^^^^^^^^^
Special thanks to ...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wespipeline-0.9.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bfdcf40c6ed4a6b5088cfa8c7aeeb4c2d8ec0913fa69450f93fe763c597eba1 |
|
MD5 | ad5e09dadfb4a74f8f203832f3b84436 |
|
BLAKE2b-256 | e21205414dbd7726974bbd090b55c998f34b66b79d3e561b306f31d8db703f09 |