Python (version <=3.12) package for parsing the genomics and transcriptomics VCF data.
Project description
vcfparser
Python (version >=3.6) package for parsing the genomics and transcriptomics VCF data.
- Free software: MIT license
- Documentation: https://vcfparser.readthedocs.io.
Features
- No external dependency except python (version >=3.6).
- Minimalistic in nature.
- Provides a lot of features to API users.
- Cython compiling is provided to optimize performance.
Installation
Method A:
VCFsimplify <https://github.com/everestial/VCF-Simplify>
_ uses vcfparser API, so the package is readily available if VCFsimplify is already installed.
This is only preferred while developing/optimizing VcfSimplify along with vcfparser.
Navigate to the VCFsimplify directory -> activate python -> call the 'vcfparser' package.
$ C:\Users\>cd VCF-Simplify
$ C:\Users\>cd VCF-Simplify>dir
Volume in drive C is StorageDrive
Volume Serial Number is .........
Directory of C:\Users\VCF-Simplify
07/12/2020 10:14 AM <DIR> .
07/12/2020 10:14 AM <DIR> ..
07/12/2020 08:55 AM <DIR> .github
............................
............................
07/12/2020 10:42 AM <DIR> vcfparser
07/12/2020 08:55 AM 1,494 VcfSimplify.py
11 File(s) 20,873,992 bytes
13 Dir(s) 241,211,793,408 bytes free
$ C:\Users\VCF-Simplify>python
Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from vcfparser import VcfParser
>>>
Method B (preferred method): Pip is the preferred method of installing and using vcfparser API if custom python scripts/app are being developed.
$ pip install vcfparser
Method C:
For offline install, or in order to build from the source code, follow :ref:advance install <advanced-install>
.
Cythonize (optional but helpful)
The installed "vcfparser" package can be cythonized to optimize performance. Cythonizing the package can increase the speed of the parser by about x.x - y.y (?) times.
TODO: Bhuwan - add required cython method in here
Usage
from vcfparser import VcfParser
vcf_obj = VcfParser('input_test.vcf')
Get metadata information from the vcf file
metainfo = vcf_obj.parse_metadata()
metainfo.fileformat
# Output: 'VCFv4.2'
metainfo.filters
# Output: [{'ID': 'LowQual', 'Description': 'Low quality'}, {'ID': 'my_indel_filter', 'Description': 'QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0'}, {'ID': 'my_snp_filter', 'Description': 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0'}]
metainfo.alt_
# Output: [{'ID': 'NON_REF', 'Description': 'Represents any possible alternative allele at this location'}]
metainfo.sample_names
# Output: ['ms01e', 'ms02g', 'ms03g', 'ms04h', 'MA611', 'MA605', 'MA622']
metainfo.record_keys
# Output: ['CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO', 'FORMAT', 'ms01e', 'ms02g', 'ms03g', 'ms04h', 'MA611', 'MA605', 'MA622']
Get Records from the vcf file
records = vcf_obj.parse_records()
# Note: Records are returned as a generator.
first_record = next(records)
first_record.CHROM
# Output: '2'
first_record.POS
# Output: '15881018'
first_record.REF
# Output: 'G'
first_record.ALT
# Output: 'A,C'
first_record.QUAL
# Output: '5082.45'
first_record.FILTER
# Output: ['PASS']
first_record.get_mapped_samples()
# Output: {'ms01e': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},
# 'ms02g': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},
# 'ms03g': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},
# 'ms04h': {'GT': '1/1', 'PI': '.', 'GQ': '6', 'PG': '1/1', 'PM': '.', 'PW': '1/1', 'AD': '0,2', 'PL': '49,6,0,.,.,.', 'DP': '2', 'PB': '.', 'PC': '.'},
# 'MA611': {'GT': '0/0', 'PI': '.', 'GQ': '78', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '29,0,0', 'PL': '0,78,1170,78,1170,1170', 'DP': '29', 'PB': '.', 'PC': '.'},
# 'MA605': {'GT': '0/0', 'PI': '.', 'GQ': '9', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '3,0,0', 'PL': '0,9,112,9,112,112', 'DP': '3', 'PB': '.', 'PC': '.'},
# 'MA622': {'GT': '0/0', 'PI': '.', 'GQ': '99', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '40,0,0', 'PL': '0,105,1575,105,1575,1575', 'DP': '40', 'PB': '.', 'PC': '.\n'}}
TODO: Bhuwan (priority - high) The very last example "first_record.get_mapped_samples()" is returning the value of the last sample/key with "\n". i.e: 'PC': '.\n' Please fix that issue - strip('\n') in the line before parsing.
|
Alternately, we can loop over each record by using a for-loop:
for record in records:
chrom = record.CHROM
pos = record.POS
id = record.ID
ref = record.REF
alt = record.ALT
qual = record.QUAL
filter = record.FILTER
format_ = record.format_
infos = record.get_info_dict()
mapped_sample = record.get_mapped_samples()
- For more specific use cases please check the examples in the following section:
- For tutorials in metadata, please follow :ref:
Metadata Tutorial <metadata-tutorial>
. - For tutorials in record parser, please follow :ref:
Record Parser Tutorial <record-parser-tutorial>
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vcfparser-0.2.2.tar.gz
.
File metadata
- Download URL: vcfparser-0.2.2.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 476db6e7601675c94f5450dadf83dabc5e9b75062712ed72abfab85dd7c727e3 |
|
MD5 | a921c81355660ee7e3fbc99034fb6eec |
|
BLAKE2b-256 | 4bb83e2746566a07cb11ceec1015edbb8b353c0d8e4132e2742c6b7580027b90 |
File details
Details for the file vcfparser-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: vcfparser-0.2.2-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81cfa15b41e8d7ebacc8745fea4b524a2cf721d4593c0ca87c8fd2e8a327a829 |
|
MD5 | 286c596fbca93167e05001d1262f66aa |
|
BLAKE2b-256 | 9b259a25f4c345497f4d0029f84a128b994ace8e3ca4b6594fa4c2816a7560d7 |