Skip to main content

An automated tool for processing whole-exome sequencing data

Project description

An automated tool for processing whole-exome sequencing data

Whole-exome sequencing has been widely used in clinical applications for the identification of the genetic causes of several diseases. HPexome automates many data processing tasks for exome-sequencing data analysis of large-scale cohorts. Given ready-analysis alignment files it is capable of breaking input data into small genomic regions to efficiently process in parallel on cluster-computing environments. It relies on Queue workflow execution engine and GATK variant calling tool and its best practices to output high-confident unified variant calling file. Our workflow is shipped as Python command line tool making it easy to install and use.

Requirements

  • BAM files must be sorted in coordinate mode. See sort bam files script.
  • BAM files must have @RG tags with ID, SM, LB, PL and PU information. See fix rg tag script.

Example

The following command line takes a list of ready-analysis BAM files stored in alignment_files directory and reference genomes files (version b37). Then it breaks input data into smaller parts (--scatter_count 16) and submits to SGE batch system (--job_runner PbsEngine). All samples will be merged into a single VCF files (--unified_vcf) and output files will be written in result_files directory.

hpexome \
    --bam alignment_files \
    --genome references/b37/human_g1k_v37_decoy.fasta  \
    --dbsnp references/b37/dbsnp_138.b37.vcf \
    --indels references/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
    --indels references/b37/1000G_phase1.indels.b37.vcf \
    --sites references/b37/1000G_phase1.snps.high_confidence.b37.vcf \
    --sites references/b37/1000G_omni2.5.b37.vcf \
    --unified_vcf \
    --scatter_count 16 \
    --job_runner GridEngine \
    result_fies

For more information see http://bcblab.org/hpexome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HPexome-1.1.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

HPexome-1.1.1-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file HPexome-1.1.1.tar.gz.

File metadata

  • Download URL: HPexome-1.1.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.4

File hashes

Hashes for HPexome-1.1.1.tar.gz
Algorithm Hash digest
SHA256 c3ab4c7f0e74f0a935309be0000bedb78317d77e157eb86ede3160d1af745667
MD5 9d6a2b11d8bfe24c4f048f1ae00dd61c
BLAKE2b-256 0b43f31ba34b8324d4a380d707d91752e5e659006902b5f6e85337fb0ddb7d66

See more details on using hashes here.

File details

Details for the file HPexome-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: HPexome-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.4

File hashes

Hashes for HPexome-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f19e8eac01f74128c777187308533385939584e2038bcc8a0e52f26061ecbe82
MD5 9ed206bbb0c017dc5d3c54ae9454adce
BLAKE2b-256 bc33f8fed33247a98a0c7577bc030781b2cd9cc740709116a8f35a11fbdedbf6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page