Skip to main content

Probabilistic VCF genotype error simulation

Project description

vcferr

The vcferr module is a lightweight error simulation framework. The tool operates on an input VCF and can probabilistically simulate the following error models for biallelic SNPs:

  • rarr = Heterozygous drop out: (0,1) or (1,0) to (0,0)
  • aara = Homozygous alt drop out: (1,1) to (0,1)
  • rrra = Heterozygous drop in: (0,0) to (0,1)
  • raaa = Homozygous alt drop in: (0,1) or (1,0) to (1,1)
  • aarr = Double homozygous alt drop out: (1,1) to (0,0)
  • rraa = Double homozygous alt drop in: (0,0) to (1,1)

In addition to error models, the tool includes functionality to inject probability of missingness:

  • ramm = Heterozygous to missing: (0,1) or (1,0) to (.,.)
  • rrmm = Homozygous ref to missing: (0,0) to (.,.)
  • aamm = Homozygous alt to missing: (1,1) to (.,.)

Installation

The vcferr tool is delivered as a Python module.

To install from PyPi:

pip install vcferr

Alternatively, clone the vcferr GitHub repository and use pip from the root of the directory:

pip install .

Note that the following dependencies are used by vcferr:

  • Python >=3.6.x
  • pysam
  • random
  • click

Usage

The examples below demonstrate basic usage with the example.vcf.gz in the data/ directory of the vcferr GitHub repository.

The following is a basic example that simulates 20% heterozygous dropout:

vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2

By default, vcferr will stream output of the VCF with errors simulated. However, if an argument is given for "output_vcf" then the VCF will be written to disk:

vcferr data/example.vcf.gz --sample='sample1' --output_vcf=data/sample1sim.example.vcf.gz --p_rarr=0.2

Note that multiple kinds of error can be simulated simulatenously:

vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0

The tool can also simulate missingness:

vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0 --p_rrmm=0.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcferr-1.0.2.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

vcferr-1.0.2-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file vcferr-1.0.2.tar.gz.

File metadata

  • Download URL: vcferr-1.0.2.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.13

File hashes

Hashes for vcferr-1.0.2.tar.gz
Algorithm Hash digest
SHA256 79d51ca9f1826e86d51cd660829fc117569c5349f868499daacc8d694030dc22
MD5 361dac50594acbc7d4c8bdd3a7335a24
BLAKE2b-256 1b5ab2c1a5f067b2c9f65dd8ae3414d1c0db3a099b850df276edf136ed9b3365

See more details on using hashes here.

File details

Details for the file vcferr-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: vcferr-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.13

File hashes

Hashes for vcferr-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 550de84864177113ddd1cf38371981831fcf5327e61807d8f4b8945589b7f443
MD5 8fe914d3e5f054c6a506617f0b9b357e
BLAKE2b-256 172066e14023dc3a36477f1d2b5ceccb2f4f92a198227241274a8fdcd42b79cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page