Probabilistic VCF genotype error simulation
Project description
vcferr
The vcferr
module is a lightweight error simulation framework. The tool operates on an input VCF and can probabilistically simulate the following error models for biallelic SNPs:
- rarr = Heterozygous drop out: (0,1) or (1,0) to (0,0)
- aara = Homozygous alt drop out: (1,1) to (0,1)
- rrra = Heterozygous drop in: (0,0) to (0,1)
- raaa = Homozygous alt drop in: (0,1) or (1,0) to (1,1)
- aarr = Double homozygous alt drop out: (1,1) to (0,0)
- rraa = Double homozygous alt drop in: (0,0) to (1,1)
In addition to error models, the tool includes functionality to inject probability of missingness:
- ramm = Heterozygous to missing: (0,1) or (1,0) to (.,.)
- rrmm = Homozygous ref to missing: (0,0) to (.,.)
- aamm = Homozygous alt to missing: (1,1) to (.,.)
Installation
The vcferr
tool is delivered as a Python module.
To install from PyPi:
pip install vcferr
Alternatively, clone the vcferr
GitHub repository and use pip
from the root of the directory:
pip install .
Note that the following dependencies are used by vcferr
:
- Python >=3.6.x
pysam
random
click
Usage
The examples below demonstrate basic usage with the example.vcf.gz
in the data/
directory of the vcferr
GitHub repository.
The following is a basic example that simulates 20% heterozygous dropout:
vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2
By default, vcferr
will stream output of the VCF with errors simulated. However, if an argument is given for "output_vcf"
then the VCF will be written to disk:
vcferr data/example.vcf.gz --sample='sample1' --output_vcf=data/sample1sim.example.vcf.gz --p_rarr=0.2
Note that multiple kinds of error can be simulated simulatenously:
vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0
The tool can also simulate missingness:
vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0 --p_rrmm=0.5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vcferr-1.0.2.tar.gz
.
File metadata
- Download URL: vcferr-1.0.2.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79d51ca9f1826e86d51cd660829fc117569c5349f868499daacc8d694030dc22 |
|
MD5 | 361dac50594acbc7d4c8bdd3a7335a24 |
|
BLAKE2b-256 | 1b5ab2c1a5f067b2c9f65dd8ae3414d1c0db3a099b850df276edf136ed9b3365 |
File details
Details for the file vcferr-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: vcferr-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 550de84864177113ddd1cf38371981831fcf5327e61807d8f4b8945589b7f443 |
|
MD5 | 8fe914d3e5f054c6a506617f0b9b357e |
|
BLAKE2b-256 | 172066e14023dc3a36477f1d2b5ceccb2f4f92a198227241274a8fdcd42b79cd |