Skip to main content

Probabilistic VCF genotype error simulation

Project description

vcferr

The vcferr module is a lightweight error simulation framework. The tool operates on an input VCF and can probabilistically simulate the following error models for biallelic SNPs:

  • rarr = Heterozygous drop out: (0,1) or (1,0) to (0,0)
  • aara = Homozygous alt drop out: (1,1) to (0,1)
  • rrra = Heterozygous drop in: (0,0) to (0,1)
  • raaa = Homozygous alt drop in: (0,1) or (1,0) to (1,1)
  • aarr = Double homozygous alt drop out: (1,1) to (0,0)
  • rraa = Double homozygous alt drop in: (0,0) to (1,1)

In addition to error models, the tool includes functionality to inject probability of missingness:

  • ramm = Heterozygous to missing: (0,1) or (1,0) to (.,.)
  • rrmm = Homozygous ref to missing: (0,0) to (.,.)
  • aamm = Homozygous alt to missing: (1,1) to (.,.)

Installation

The vcferr tool is delivered as a Python module.

To install from PyPi:

pip install vcferr

Alternatively, clone the vcferr GitHub repository and use pip from the root of the directory:

pip install .

Note that the following dependencies are used by vcferr:

  • Python >=3.6.x
  • pysam
  • random
  • click

Usage

The examples below demonstrate basic usage with the example.vcf.gz in the data/ directory of the vcferr GitHub repository.

The following is a basic example that simulates 20% heterozygous dropout:

vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2

By default, vcferr will stream output of the VCF with errors simulated. However, if an argument is given for "output_vcf" then the VCF will be written to disk:

vcferr data/example.vcf.gz --sample='sample1' --output_vcf=data/sample1sim.example.vcf.gz --p_rarr=0.2

Note that multiple kinds of error can be simulated simulatenously:

vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0

The tool can also simulate missingness:

vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0 --p_rrmm=0.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcferr-1.0.2.tar.gz (4.3 kB view hashes)

Uploaded Source

Built Distribution

vcferr-1.0.2-py3-none-any.whl (5.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page