Skip to main content

An automated tool for processing whole-exome sequencing data

Project description

An automated tool for processing whole-exome sequencing data

Whole-exome sequencing has been widely used in clinical applications for the identification of the genetic causes of several diseases. HPexome automates many data processing tasks for exome-sequencing data analysis of large-scale cohorts. Given ready-analysis alignment files it is capable of breaking input data into small genomic regions to efficiently process in parallel on cluster-computing environments. It relies on Queue workflow execution engine and GATK variant calling tool and its best practices to output high-confident unified variant calling file. Our workflow is shipped as Python command line tool making it easy to install and use.

Requirements

  • BAM files must be sorted in coordinate mode. See sort bam files script.
  • BAM files must have @RG tags with ID, SM, LB, PL and PU information. See fix rg tag script.

Example

The following command line takes a list of ready-analysis BAM files stored in alignment_files directory and reference genomes files (version b37). Then it breaks input data into smaller parts (--scatter_count 16) and submits to SGE batch system (--job_runner PbsEngine). All samples will be merged into a single VCF files (--unified_vcf) and output files will be written in result_files directory.

hpexome \
    --bam alignment_files \
    --genome references/b37/human_g1k_v37_decoy.fasta  \
    --dbsnp references/b37/dbsnp_138.b37.vcf \
    --indels references/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
    --indels references/b37/1000G_phase1.indels.b37.vcf \
    --sites references/b37/1000G_phase1.snps.high_confidence.b37.vcf \
    --sites references/b37/1000G_omni2.5.b37.vcf \
    --unified_vcf \
    --scatter_count 16 \
    --job_runner GridEngine \
    result_fies

For more information see http://bcblab.org/hpexome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HPexome-1.1.2.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

HPexome-1.1.2-py2-none-any.whl (19.0 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page