Probabilistic VCF genotype error simulation
Project description
vcferr
The vcferr module is a lightweight error simulation framework. The tool operates on an input VCF and can probabilistically simulate the following error models for biallelic SNPs:
- rarr = Heterozygous drop out: (0,1) or (1,0) to (0,0)
- aara = Homozygous alt drop out: (1,1) to (0,1)
- rrra = Heterozygous drop in: (0,0) to (0,1)
- raaa = Homozygous alt drop in: (0,1) or (1,0) to (1,1)
- aarr = Double homozygous alt drop out: (1,1) to (0,0)
- rraa = Double homozygous alt drop in: (0,0) to (1,1)
In addition to error models, the tool includes functionality to inject probability of missingness:
- ramm = Heterozygous to missing: (0,1) or (1,0) to (.,.)
- rrmm = Homozygous ref to missing: (0,0) to (.,.)
- aamm = Homozygous alt to missing: (1,1) to (.,.)
Installation
The vcferr tool is delivered as a Python module.
To install from PyPi:
pip install vcferr
Alternatively, clone the vcferr GitHub repository and use pip from the root of the directory:
pip install .
Note that the following dependencies are used by vcferr:
- Python >=3.6.x
pysamrandomclick
Usage
The examples below demonstrate basic usage with the example.vcf.gz in the data/ directory of the vcferr GitHub repository.
The following is a basic example that simulates 20% heterozygous dropout:
vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2
By default, vcferr will stream output of the VCF with errors simulated. However, if an argument is given for "output_vcf" then the VCF will be written to disk:
vcferr data/example.vcf.gz --sample='sample1' --output_vcf=data/sample1sim.example.vcf.gz --p_rarr=0.2
Note that multiple kinds of error can be simulated simulatenously:
vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0
The tool can also simulate missingness:
vcferr data/example.vcf.gz --sample='sample1' --p_rarr=0.2 --p_raaa=0.1 --p_rrra=0.2 --p_aara=0 --p_rrmm=0.5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcferr-1.0.2.tar.gz.
File metadata
- Download URL: vcferr-1.0.2.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79d51ca9f1826e86d51cd660829fc117569c5349f868499daacc8d694030dc22
|
|
| MD5 |
361dac50594acbc7d4c8bdd3a7335a24
|
|
| BLAKE2b-256 |
1b5ab2c1a5f067b2c9f65dd8ae3414d1c0db3a099b850df276edf136ed9b3365
|
File details
Details for the file vcferr-1.0.2-py3-none-any.whl.
File metadata
- Download URL: vcferr-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
550de84864177113ddd1cf38371981831fcf5327e61807d8f4b8945589b7f443
|
|
| MD5 |
8fe914d3e5f054c6a506617f0b9b357e
|
|
| BLAKE2b-256 |
172066e14023dc3a36477f1d2b5ceccb2f4f92a198227241274a8fdcd42b79cd
|