genomvar

Sequence variant analysis in Python

These details have not been verified by PyPI

Project links

Homepage

Project description

Package genomvar works with genomic variants and implements set-like operations on them. It supports import from VCF files and export to NumPy.

For documentation see here.

Installation

Requirements:

Python >=3.6
rbi-tree
jinja2

To install:

pip install genomvar

Sample usage

Case 1

Common task in genome variant analysis is: there are two VCF files (for example obtained from variant caller #1 and caller #2) and the differences should be investigated.

First we read the VCF files into genomvar genomvar.varset.VariantSet objects which hold the variants with underlying data contained in INFO fields:

>>> from genomvar.varset import VariantSet
>>> vs1 = VariantSet.from_vcf('caller1.out.vcf.gz',parse_info=True)
>>> vs2 = VariantSet.from_vcf('caller2.out.vcf.gz',parse_info=True)

To find variants detected by caller #1 but not caller #2 diff method is used. Then differences are exported to numpy for futher analysis:

>>> diff = vs1.diff(vs2)
>>> recs = diff.to_records() # recs is a numpy structured dtype array
>>> recs[['chrom','start','end','ref','alt','vartype']]
[('chr1',  1046755,  1046756, 'T', 'G', 'SNP')
 ('chr1',  1057987,  1057988, 'T', 'C', 'SNP')
  ...,
 ('chr19', 56434340, 56434341, 'A', 'G', 'SNP')
 ('chrY', 56839067, 56839068, 'A', 'G', 'SNP')]
>>> recs['INFO']['DP'].mean() # recs['INFO']['DP'] is a numpy ndarray
232.18819746028257

Case 2

There is a smaller variant file obtained from the data and a bigger one usually obtained from a database. Variants in the former should be “annotated” with some data associated with variants in the latter.

This case is different from the previous in that DB file might not comfortably fit into memory. Class genomvar.varset.VariantSetFromFile can be used for this purpose:

>>> vs = varset.VariantSet.from_vcf('vcf_of_interest.vcf')
>>> dbSNP = varset.VariantSetFromFile('DBSNP.vcf.gz', index=Trueg)
>>> annots = []
>>> for vrt in vs.iter_vrt():
>>>     m = dbSNP.match(vrt)
>>>     annots.append(m[0].attrib['id'] if m else None)
>>> annots
[None, None, 'rs540057607', 'rs367710686', 'rs940651103', ...]

Here genomvar.varset.VariantSet.match method is used. It searches for variants with the same genomic alteration as argument variant and returns a list of those. Then VCF ID field can be accessed from those matching variants in attrib['id'] (dbSNP rs numbers in this particular case).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Sep 22, 2021

0.3.dev2 pre-release

Sep 21, 2021

0.3.dev1 pre-release

Sep 3, 2021

0.2.14

Apr 8, 2021

0.2.12

Sep 4, 2020

0.2.11

Jul 31, 2020

0.2.1

Jul 27, 2020

0.2.0

Jul 24, 2020

0.1.16

Jul 24, 2020

0.1.15

Jul 24, 2020

0.1.14

Jul 24, 2020

0.1.13

Jul 10, 2020

0.1.12

Jul 10, 2020

0.1.11

Jul 9, 2020

0.1.1

Jul 9, 2020

0.1

Jun 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genomvar-0.3.0.tar.gz (45.8 kB view details)

Uploaded Sep 22, 2021 Source

File details

Details for the file genomvar-0.3.0.tar.gz.

File metadata

Download URL: genomvar-0.3.0.tar.gz
Upload date: Sep 22, 2021
Size: 45.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.7

File hashes

Hashes for genomvar-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`639a265e272897362e716d5efae478fb7952663b9d7db3fe002be30bb9468105`
MD5	`3a4b5b4f1040d3c9726b878226bf70e0`
BLAKE2b-256	`fa4042d18fbeb6e3a1bff1b973f183f82c517ff818b294555b2b07828fabffbb`

See more details on using hashes here.

genomvar 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Sample usage

Case 1

Case 2

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes