Skip to main content

An automated tool for processing whole-exome sequencing data

Project description

An automated tool for processing whole-exome sequencing data

Whole-exome sequencing has been widely used in clinical applications for the identification of the genetic causes of several diseases. HPexome automates many data processing tasks for exome-sequencing data analysis of large-scale cohorts. Given ready-analysis alignment files it is capable of breaking input data into small genomic regions to efficiently process in parallel on cluster-computing environments. It relies on Queue workflow execution engine and GATK variant calling tool and its best practices to output high-confident unified variant calling file. Our workflow is shipped as Python command line tool making it easy to install and use.

Requirements

  • BAM files must be sorted in coordinate mode. See sort bam files script.
  • BAM files must have @RG tags with ID, SM, LB, PL and PU information. See fix rg tag script.

Example

The following command line takes a list of ready-analysis BAM files stored in alignment_files directory and reference genomes files (version b37). Then it breaks input data into smaller parts (--scatter_count 16) and submits to SGE batch system (--job_runner PbsEngine). All samples will be merged into a single VCF files (--unified_vcf) and output files will be written in result_files directory.

hpexome \
    --bam alignment_files \
    --genome references/b37/human_g1k_v37_decoy.fasta  \
    --dbsnp references/b37/dbsnp_138.b37.vcf \
    --indels references/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
    --indels references/b37/1000G_phase1.indels.b37.vcf \
    --sites references/b37/1000G_phase1.snps.high_confidence.b37.vcf \
    --sites references/b37/1000G_omni2.5.b37.vcf \
    --unified_vcf \
    --scatter_count 16 \
    --job_runner GridEngine \
    result_fies

For more information see http://bcblab.org/hpexome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HPexome-1.1.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

HPexome-1.1.2-py2-none-any.whl (19.0 kB view details)

Uploaded Python 2

File details

Details for the file HPexome-1.1.2.tar.gz.

File metadata

  • Download URL: HPexome-1.1.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.4

File hashes

Hashes for HPexome-1.1.2.tar.gz
Algorithm Hash digest
SHA256 80d94e639c5906cef3eb9da0eb2d8a851320e351c2634a3f344a726dcaf281d1
MD5 13be2ec4d071d75782d3391e4f852e9a
BLAKE2b-256 5a049d708ba02d768be257d93349e03ce9ad1890669752ed2d14760c8d7bc9b5

See more details on using hashes here.

File details

Details for the file HPexome-1.1.2-py2-none-any.whl.

File metadata

  • Download URL: HPexome-1.1.2-py2-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.4

File hashes

Hashes for HPexome-1.1.2-py2-none-any.whl
Algorithm Hash digest
SHA256 961f906562ce8b4cd583aa771dbd2df93bf71b0976b4dd09f778f909bcf6f8c7
MD5 a94fcfe3e5ec8ac7c27d1b1c3ad0f737
BLAKE2b-256 42708f4f507350060f332b8eb202c65b0bc8691ec2e51f1ce26840d1cb8f9a49

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page