Skip to main content

fast vcf parsing with cython + htslib

Project description

cyvcf2

Note: cyvcf2 versions < 0.20.0 require htslib < 1.10. cyvcf2 versions >= 0.20.0 require htslib >= 1.10

The latest documentation for cyvcf2 can be found here:

Docs

If you use cyvcf2, please cite the paper

Fast python (2 and 3) parsing of VCF and BCF including region-queries.

Build Status

cyvcf2 is a cython wrapper around htslib built for fast parsing of Variant Call Format (VCF) files.

Attributes like variant.gt_ref_depths return a numpy array directly so they are immediately ready for downstream use. note that the array is backed by the underlying C data, so, once variant goes out of scope. The array will contain nonsense. To persist a copy, use: cpy = np.array(variant.gt_ref_depths) instead of just arr = variant.gt_ref_depths.

Example

The example below shows much of the use of cyvcf2.

from cyvcf2 import VCF

for variant in VCF('some.vcf.gz'): # or VCF('some.bcf')
    variant.REF, variant.ALT # e.g. REF='A', ALT=['C', 'T']

    variant.CHROM, variant.start, variant.end, variant.ID, \
                variant.FILTER, variant.QUAL

    # numpy arrays of specific things we pull from the sample fields.
    # gt_types is array of 0,1,2,3==HOM_REF, HET, UNKNOWN, HOM_ALT
    variant.gt_types, variant.gt_ref_depths, variant.gt_alt_depths # numpy arrays
    variant.gt_phases, variant.gt_quals, variant.gt_bases # numpy array

    ## INFO Field.
    ## extract from the info field by it's name:
    variant.INFO.get('DP') # int
    variant.INFO.get('FS') # float
    variant.INFO.get('AC') # float

    # convert back to a string.
    str(variant)


    ## sample info...

    # Get a numpy array of the depth per sample:
    dp = variant.format('DP')
    # or of any other format field:
    sb = variant.format('SB')
    assert sb.shape == (n_samples, 4) # 4-values per

# to do a region-query:

vcf = VCF('some.vcf.gz')
for v in vcf('11:435345-556565'):
    if v.INFO["AF"] > 0.1: continue
    print(str(v))

Installation

pip (assuming you have htslib < 1.10 installed)

pip install cyvcf2

github (building htslib and cyvcf2 from source)

git clone --recursive https://github.com/brentp/cyvcf2
cd cyvcf2/htslib
autoheader
autoconf
./configure --enable-libcurl
make

cd ..
pip install -r requirements.txt
CYTHONIZE=1 pip install -e .

On OSX, using brew, you may have to set the following as indicated by the brew install:

For compilers to find openssl you may need to set:
  export LDFLAGS="-L/usr/local/opt/openssl/lib"
  export CPPFLAGS="-I/usr/local/opt/openssl/include"

For pkg-config to find openssl you may need to set:
  export PKG_CONFIG_PATH="/usr/local/opt/openssl/lib/pkgconfig"

Testing

Install pytest, then tests can be run with:

pytest

CLI

Run with cyvcf2 path_to_vcf

$ cyvcf2 --help
Usage: cyvcf2 [OPTIONS] <vcf_file> or -

  fast vcf parsing with cython + htslib

Options:
  -c, --chrom TEXT                Specify what chromosome to include.
  -s, --start INTEGER             Specify the start of region.
  -e, --end INTEGER               Specify the end of the region.
  --include TEXT                  Specify what info field to include.
  --exclude TEXT                  Specify what info field to exclude.
  --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
                                  Set the level of log output.  [default:
                                  INFO]
  --silent                        Skip printing of vcf.
  --help                          Show this message and exit.

See Also

Pysam also has a cython wrapper to htslib and one block of code here is taken directly from that library. But, the optimizations that we want for gemini are very specific so we have chosen to create a separate project.

Performance

For the performance comparison in the paper, we used thousand genomes chromosome 22 With the full comparison runner here.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyvcf2-0.30.17.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cyvcf2-0.30.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.17-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cyvcf2-0.30.17-cp311-cp311-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

cyvcf2-0.30.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.17-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cyvcf2-0.30.17-cp310-cp310-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

cyvcf2-0.30.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.17-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

cyvcf2-0.30.17-cp39-cp39-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

cyvcf2-0.30.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.17-cp38-cp38-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

cyvcf2-0.30.17-cp38-cp38-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

cyvcf2-0.30.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

cyvcf2-0.30.17-cp37-cp37m-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

cyvcf2-0.30.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

cyvcf2-0.30.17-cp36-cp36m-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file cyvcf2-0.30.17.tar.gz.

File metadata

  • Download URL: cyvcf2-0.30.17.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for cyvcf2-0.30.17.tar.gz
Algorithm Hash digest
SHA256 abb86cf658838bcd82d2b5d9660d6eff1fc23927fb071edf0c3364716bf4c56a
MD5 477834675d25dc6bcabaf7c130b76e67
BLAKE2b-256 1f0ee0c7762e981b5cb087170d33c4488ad3a5d2edb26e1df568348a3e338929

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ef4738fee3e252538b3447b991f55c68a2b7ae6928d6f29a68ccd2d03151d6b7
MD5 aa7a558594728ae523d617f752b18c3d
BLAKE2b-256 7e1db1e6da5fb6eaf40c804d4b181c0660ad5e4b5c80da9a07c8f8db1835d293

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8a165dc3065adaecf5aa6971d8c752c2c5031bdffe7cc91b3a8e5a01c598acd3
MD5 7cc4aeefbd37447cea3ebb39ac8ef3b6
BLAKE2b-256 5a65064ce6950272da0f07c9e3768be59213ecda8e162a2119b292409c39ddb0

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3dd5a2c26c0ecf0fa96cbe9c324f22f924b83bde230067817694af8cf3385266
MD5 0b554af5af4ef464bb96d11d12c70188
BLAKE2b-256 0aa77e819a332fa6d874f5424881888e283025845bd835eb7e8514bac2d0e7aa

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 27d1ae72e7f706443297b156822a4637917577ee8688fb00572b7881bb9f75a1
MD5 8ebfe161ba7136f4fecfb9e3aa07e5a3
BLAKE2b-256 b38e4b8fbbf3292bbf1544142824a55b70690d3d95dc377c6322f1ae6d71d113

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d339d3bfd5b8b2a5d1d8fc6f2d8351bc4a6b7717f258713acca7826012075046
MD5 9bc5f42769bcfc0a8bc00fc762ffa3b5
BLAKE2b-256 11781218f2b8e1e354dcc425bf2a6761749ca475f550f564e896cf57fe0655a8

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 789373e8687c1311346dbace056d644a515c4cf166b7944c4059868de7b710da
MD5 ae39bd6efb402090d4ab68e8281da5db
BLAKE2b-256 dfa99b6c39a300980e1dbd86de29ab305308d24d09f17d6e4c308b8687d2db1a

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 de1bdf0946ccc0e947beebb058170bccfe93da568a69c018e150a57cfdb2bbbd
MD5 f54cb015af42499e0ace9aa39d67b676
BLAKE2b-256 5fc3b1093d358176ffc234894f10805e03d9e5c91cb28c03af2415b6ab1fb334

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 af3f48a0122937870703605c12fab33014fe385961ef568fcb9960b6386f2eef
MD5 ec1292a5ba92dca625fdcc4879311f6d
BLAKE2b-256 801c994e48ec282f2b381a306f31a7753609900b78ba2cc32df88b9728439466

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 33cd0441c7437dab0ebdeab7b305cdbd1c278cf87c035f0b01f021cb25242724
MD5 e94e689277f7a5fa1086aaef95b4b886
BLAKE2b-256 d872b8ac88816572035e31ce40b1cec7ae3ffb784dad33fea08c51e3e0baece7

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 68a32027607f16ea8e8a94145dfbbde558b77b7b2749b1b1ebea5b203625111d
MD5 ad3fbbe586a6b14cb079f1c255083954
BLAKE2b-256 8b2703e8827c5a1e175bed48189d56acc01c8ab3b6f6c5b15dd72e356f2f7202

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7bea5e623ee446f8a677979d4ebfc930118c6273ce8d45acc73d879cc6ee56aa
MD5 ee4c7cda8c2724c2c60fd504dae6318b
BLAKE2b-256 24b9340aa2b1baa0a8719fc02cbe878a251daf2e900a8f1e66e4323d7aa315b7

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1e1cdee5ca4a4dd1e29726c25a8e717d5fd58d9743d60ae6eea053d4ec4031c5
MD5 df96af77b6c8e7ab6bd02b56021fc4f6
BLAKE2b-256 fa1e3bb1df0aa3b90ad5937dab274cc79dc5a7a53bdee41cf71ba4a4a2aae521

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e8a6f6eef59dff2d8c58991d512cf0c6c91e28c6583b31344dc6246c24d903a2
MD5 921c7cd374b5c52bfd156aa9286913df
BLAKE2b-256 8e5523a16c398a9a9f8891610196fab5ff2e9e9d52b7bec961f95f59ade390ef

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3648b0bae8bd4de1c1e04227de9e29b4c9edb89922d8622a16aa1662bd3c2332
MD5 859e13f504aa38e8be27e25c6cb6a79f
BLAKE2b-256 8b21001fc49531e4e8bc111b573973fb12688d4f30eae28b2e0934b2105d2041

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8ac9506b2e9e1c1e935916c5b934d00c43e9aea9b3a9b4eb7d48755605f280fa
MD5 42ba5cbcabd561d31f70f9561dc21e4e
BLAKE2b-256 156a28f9f6480890e0cdd78bc2cb09bbac47e5908386e1dd5459920d69c6aac8

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.17-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.17-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 15feae6685f46c4fd5df696767fe8b024ad6980df42f193cc250cd4f72d45431
MD5 41ee7f796aad8df63e0c7c714018b392
BLAKE2b-256 9883572fbd13a90a44eda3944a41e52f1b7f8646466fe2edf3148e24ada94018

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page