Skip to main content

An implementation of a whole exome analysis pipeline using the library Luigi for workflow management.

Project description

Wespipeline

An implementation of a whole exome analysis pipeline using Luigi <https://github.com/spotify/luigi/>_ for workflow management.

.. figure:: https://raw.githubusercontent.com/janchorizo/wespipeline/master/docs/steps.png :alt: Steps Logo :align: center

This package provides with the implementation of tasks for executing partial or complete variant calling analysis with the advantages of having a workflow manager: dependency resolution, execution planner, modularity, monitoring and historic.

Documentation for the latest version is being hosted by readthedocs <https://wespipeline.readthedocs.io/en/latest/>_

Installation ^^^^^^^^^^^^ Wespipeline is available through pip, conda and manual installation. Install it from the package repositories pip3 install wespipeline conda install -c jancho wespipeline, or download the project and build from source: git clone https://github.com/Janchorizo/wespipeline.git && cd wespipeline && python3 setup.py install.

Notice that executing the analysis will involve different additional dependencies depending on the steps that executed and the parameters set for these. All possible are cited below and can be downloaded with the Anaconda distribution:

  • Secuence retrieval : Sra Toolkit, Fastqc
  • Reference genome retrieval : No needed dependency
  • Secuence alignment : Bwa
  • Alignment processing : Bwa Samtools,
  • Variant calling : Freebayes, Varscan, Gatk, Deepvariant
  • Variant calling evaluation : Vcf tools

In addition to the dependencies, conda can be used for installing the wespipeline package. An example for installing the miniconda distribution, the package and the dependencies is:

.. code-block:: bash

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh bash ~/miniconda.sh -b -p $HOME/miniconda export PATH="$HOME/miniconda/bin:$PATH" source $HOME/miniconda/bin/activate &&
conda config --add channels bioconda &&
conda config --add channels conda-forge &&
conda config --add channels jancho &&
conda install -y samtools &&
conda install -y bwa &&
conda install -y picard &&
conda install -y platypus-variant &&
conda install -y varscan &&
conda install -y freebayes &&
conda install -y fastqc &&
conda install -y sra-tools &&
conda install -y wespipeline

rm ~/miniconda.sh

Getting started ^^^^^^^^^^^^^^^

Installing or downloading the package will provide with a higher level task per step of the analysis, each of which can be executed in a similar fashion to other Luigi tasks.

Each of the six steps have a higher level task that can be scheduled in a similar fashion to other Luigi tasks:

.. code-block:: bash

python3 -m luigi --module wespipeline.<module> <Taskname> --<Taskname>-param value

Download the sequences using the NCBI accession number.

.. code-block:: bash

python3 -m luigi --module wespipeline.fastq FastqRetrieval \
	--FastqRetrieval-paired-end true \
	--FastqRetrieval-accession-number SRR9209557 \
	--FastqRetrieval-create-report true

Or an external url.

.. code-block:: bash

python3 -m luigi --module wespipeline.fastq FastqRetrieval \
	--FastqRetrieval-paired-end true \
	--FastqRetrieval-compressed false \
	--FastqRetrieval-accession-number SRR9209557 \
	--FastqRetrieval-create-report true

Download the reference genome and create a report using FastqC.

.. code-block:: bash

python3.6 -m luigi --module tasks.reference ReferenceRetrieval 
	--workers 3 \
	--ReferenceGenome-ref-url ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit \
	--ReferenceGenome-from2bit True \
	--GlobalParams-base-dir ./tfm_experiment \
	--GlobalParams-log-dir .logs \
	--GlobalParams-exp-name hg19

Or run the whole analysis, specifying the parameters for each of the steps.

.. code-block:: bash

python3 -m luigi --module tasks.vcf VariantCalling 
	--workers 3 
	--VariantCalling-use-platypus true 
	--VariantCalling-use-freebayes true 
	--VariantCalling-use-samtools false 
	--VariantCalling-use-gatk false 
	--VariantCalling-use-deepcalling false 
	--AlignProcessing-cpus 6 
	--FastqAlign-cpus 6 
	--FastqAlign-create-report True 
	--GetFastq-gz-compressed True 
	--GetFastq-fastq1-url ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R1_001.fastq.gz 
	--GetFastq-fastq2-url ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R2_001.fastq.gz 
	--GetFastq-from-ebi False 
	--GetFastq-paired-end True 
	--ReferenceGenomeRetrieval-ref-url ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit --ReferenceGenomeRetrieval-from2bit True 
	--GlobalParams-base-dir ./tfm_experiment 
	--GlobalParams-log-dir .logs 
	--GlobalParams-exp-name hg19 

Tasks implemented ^^^^^^^^^^^^^^^^^

+-----------------+----------------------------+ | Module | Task | +=================+============================+ | reference | ReferenceGenomeRetrieval | +-----------------+----------------------------+ | fastq | FastqRetrieval | +-----------------+----------------------------+ | align | FastqAlignment | +-----------------+----------------------------+ | processalign | FastqProcessing | +-----------------+----------------------------+ | variantcalling | VariantCalling | +-----------------+----------------------------+ | processalign | VariantProcessing | +-----------------+----------------------------+

Acknowledgements ^^^^^^^^^^^^^^^^

Special thanks to professor Luis Antonio Miguel Quintales for all the guidance and help provided during the development of this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wespipeline-1.0.1.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wespipeline-1.0.1-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file wespipeline-1.0.1.tar.gz.

File metadata

  • Download URL: wespipeline-1.0.1.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for wespipeline-1.0.1.tar.gz
Algorithm Hash digest
SHA256 f76d6ae98df5b02cf8e0f9b0d3b7ed70f6937971ebea52d9203f59b4a695de23
MD5 bc80a241113bc76f04d493543c8f97ad
BLAKE2b-256 b3b0ed376e1c7c595d211833ce152a8b1bbb63154274f526b45114c20e730f9a

See more details on using hashes here.

File details

Details for the file wespipeline-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: wespipeline-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for wespipeline-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed9d033aaf3418cb658177d122563ac2a0239130b0a9d26d98d1843878176170
MD5 c9f6b20c2475a813594e905ca9b9f984
BLAKE2b-256 2d4174e38def44c06c3c266dc5f3e1e59779e6ef7a815a9ca6b00a63dc0ad6dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page