Skip to main content

fast vcf parsing with cython + htslib

Project description

cyvcf2
======

[![Build Status](https://travis-ci.org/brentp/cyvcf2.svg?branch=master)](https://travis-ci.org/brentp/cyvcf2)

cyvcf2 is a cython wrapper around [htslib](https://github.com/samtools/htslib) built for fast parsing of [Variant Call Format](https://en.m.wikipedia.org/wiki/Variant_Call_Format) (VCF) files.
It is targetted toward our use-case in [gemini](http://gemini.rtfd.org) but should also be of general utility.

On a file with 189 samples that takes [cyvcf](https://github.com/arq5x/cyvcf) **21 seconds** to parse and extract all sample information, it takes `cyvcf2` **1.4 seconds**.

Attributes like `variant.gt_ref_depths` return a numpy array directly so they are immediately ready for downstream use.

Example
=======

```Python
from cyvcf2 import VCF

for variant in VCF('some.vcf.gz'):

variant.gt_types # numpy array
variant.gt_ref_depths, variant.gt_alt_depths # numpy arrays
variant.gt_phases, variant.gt_quals # numpy arrays
variant.gt_bases # numpy array
variant.CHROM, variant.start, variant.end, variant.ID, \
variant.REF, variant.ALT, variant.FILTER, variant.QUAL
variant.INFO.get('DP') # int
variant.INFO.get('FS') # float
variant.INFO.get('AC') # float
a = variant.gt_phred_ll_homref # numpy array
b = variant.gt_phred_ll_het # numpy array
c = variant.gt_phred_ll_homalt # numpy array

str(variant)
```

Installation
============

```
pip install cyvcf2
```

Testing
=======

Tests can be run with:

```
python setup.py test
```

See Also
========

Pysam also [has a cython wrapper to htslib](https://github.com/pysam-developers/pysam/blob/master/pysam/cbcf.pyx) and one block of code here is taken directly from that library. But, the optimizations that we want for gemini are very specific so we have chosen to create a separate project.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyvcf2-0.2.2.tar.gz (811.0 kB view details)

Uploaded Source

File details

Details for the file cyvcf2-0.2.2.tar.gz.

File metadata

  • Download URL: cyvcf2-0.2.2.tar.gz
  • Upload date:
  • Size: 811.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cyvcf2-0.2.2.tar.gz
Algorithm Hash digest
SHA256 76fb1f17d8e4d91a2cba462b69f999f2c59f34a54df25a9be48f4d0942bc534a
MD5 2458948db333f342448574f2caaabf18
BLAKE2b-256 ff5e91139637a81d6b31df9a306b8f2d83c07fe92dc8aaf55e5fd77d26257539

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page