Skip to main content

fast vcf parsing with cython + htslib

Project description

cyvcf2
======

[![Build Status](https://travis-ci.org/brentp/cyvcf2.svg?branch=master)](https://travis-ci.org/brentp/cyvcf2)

cyvcf2 is a cython wrapper around [htslib](https://github.com/samtools/htslib) built for fast parsing of [Variant Call Format](https://en.m.wikipedia.org/wiki/Variant_Call_Format) (VCF) files.
It is targetted toward our use-case in [gemini](http://gemini.rtfd.org) but should also be of general utility.

On a file with 189 samples that takes [cyvcf](https://github.com/arq5x/cyvcf) **21 seconds** to parse and extract all sample information, it takes `cyvcf2` **1.4 seconds**.

Attributes like `variant.gt_ref_depths` return a numpy array directly so they are immediately ready for downstream use.

Example
=======

```Python
from cyvcf2 import VCF

for variant in VCF('some.vcf.gz'):

variant.gt_types # numpy array
variant.gt_ref_depths, variant.gt_alt_depths # numpy arrays
variant.gt_phases, variant.gt_quals # numpy arrays
variant.gt_bases # numpy array
variant.CHROM, variant.start, variant.end, variant.ID, \
variant.REF, variant.ALT, variant.FILTER, variant.QUAL
variant.INFO.get('DP') # int
variant.INFO.get('FS') # float
variant.INFO.get('AC') # float
a = variant.gt_phred_ll_homref # numpy array
b = variant.gt_phred_ll_het # numpy array
c = variant.gt_phred_ll_homalt # numpy array

str(variant)
```

Installation
============

```
pip install cyvcf2
```

Testing
=======

Tests can be run with:

```
python setup.py test
```

See Also
========

Pysam also [has a cython wrapper to htslib](https://github.com/pysam-developers/pysam/blob/master/pysam/cbcf.pyx) and one block of code here is taken directly from that library. But, the optimizations that we want for gemini are very specific so we have chosen to create a separate project.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyvcf2-0.2.0.tar.gz (794.5 kB view details)

Uploaded Source

File details

Details for the file cyvcf2-0.2.0.tar.gz.

File metadata

  • Download URL: cyvcf2-0.2.0.tar.gz
  • Upload date:
  • Size: 794.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cyvcf2-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2227d53ac75d6c0f0e7dafda33b79c9af17dad85acec20c96efe07d21d4c2fd7
MD5 52cc7144f925c6c0643feb583838379e
BLAKE2b-256 26e6da2bd392f09586057131c5d41257f41a563f599b558b6521b9696ce65958

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page