Skip to main content

fast vcf parsing with cython + htslib

Project description

cyvcf2

Note: cyvcf2 versions < 0.20.0 require htslib < 1.10. cyvcf2 versions >= 0.20.0 require htslib >= 1.10

The latest documentation for cyvcf2 can be found here:

Docs

If you use cyvcf2, please cite the paper

Fast python (2 and 3) parsing of VCF and BCF including region-queries.

Build Status

cyvcf2 is a cython wrapper around htslib built for fast parsing of Variant Call Format (VCF) files.

Attributes like variant.gt_ref_depths return a numpy array directly so they are immediately ready for downstream use. note that the array is backed by the underlying C data, so, once variant goes out of scope. The array will contain nonsense. To persist a copy, use: cpy = np.array(variant.gt_ref_depths) instead of just arr = variant.gt_ref_depths.

Example

The example below shows much of the use of cyvcf2.

from cyvcf2 import VCF

for variant in VCF('some.vcf.gz'): # or VCF('some.bcf')
	variant.REF, variant.ALT # e.g. REF='A', ALT=['C', 'T']


	variant.CHROM, variant.start, variant.end, variant.ID, \
				variant.FILTER, variant.QUAL

	# numpy arrays of specific things we pull from the sample fields.
	# gt_types is array of 0,1,2,3==HOM_REF, HET, UNKNOWN, HOM_ALT
	variant.gt_types, variant.gt_ref_depths, variant.gt_alt_depths # numpy arrays
	variant.gt_phases, variant.gt_quals, variant.gt_bases # numpy array


	## INFO Field.
	## extract from the info field by it's name:
	variant.INFO.get('DP') # int
	variant.INFO.get('FS') # float
	variant.INFO.get('AC') # float

	# convert back to a string.
	str(variant)


	## sample info...

	# Get a numpy array of the depth per sample:
    dp = variant.format('DP')
    # or of any other format field:
    sb = variant.format('SB')
    assert sb.shape == (n_samples, 4) # 4-values per

# to do a region-query:

vcf = VCF('some.vcf.gz')
for v in vcf('11:435345-556565'):
    if v.INFO["AF"] > 0.1: continue
    print(str(v))

Installation

pip (assuming you have htslib < 1.10 installed)

pip install cyvcf2

github (building htslib and cyvcf2 from source)

git clone --recursive https://github.com/brentp/cyvcf2
cd cyvcf2/htslib
autoheader
autoconf
./configure --enable-libcurl
make

cd ..
pip install -r requirements.txt
CYTHONIZE=1 pip install -e .

On OSX, using brew, you may have to set the following as indicated by the brew install:

For compilers to find openssl you may need to set:
  export LDFLAGS="-L/usr/local/opt/openssl/lib"
  export CPPFLAGS="-I/usr/local/opt/openssl/include"

For pkg-config to find openssl you may need to set:
  export PKG_CONFIG_PATH="/usr/local/opt/openssl/lib/pkgconfig"

Testing

Tests can be run with:

python setup.py test

CLI

Run with cyvcf2 path_to_vcf

$ cyvcf2 --help
Usage: cyvcf2 [OPTIONS] <vcf_file> or -

  fast vcf parsing with cython + htslib

Options:
  -c, --chrom TEXT                Specify what chromosome to include.
  -s, --start INTEGER             Specify the start of region.
  -e, --end INTEGER               Specify the end of the region.
  --include TEXT                  Specify what info field to include.
  --exclude TEXT                  Specify what info field to exclude.
  --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
                                  Set the level of log output.  [default:
                                  INFO]
  --silent                        Skip printing of vcf.
  --help                          Show this message and exit.

See Also

Pysam also has a cython wrapper to htslib and one block of code here is taken directly from that library. But, the optimizations that we want for gemini are very specific so we have chosen to create a separate project.

Performance

For the performance comparison in the paper, we used thousand genomes chromosome 22 With the full comparison runner here.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyvcf2-0.30.12.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

cyvcf2-0.30.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.12-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

cyvcf2-0.30.12-cp39-cp39-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

cyvcf2-0.30.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.12-cp38-cp38-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

cyvcf2-0.30.12-cp38-cp38-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

cyvcf2-0.30.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.7 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.12-cp37-cp37m-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

cyvcf2-0.30.12-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.7 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.12-cp36-cp36m-macosx_10_9_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file cyvcf2-0.30.12.tar.gz.

File metadata

  • Download URL: cyvcf2-0.30.12.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12.tar.gz
Algorithm Hash digest
SHA256 14c757b4f771672d3407a00cb0f40db940a317237418ad013b448af89f0c0074
MD5 cd49b42a859aad1e29265ece3fa9c4eb
BLAKE2b-256 3d4211d8aadf7d925e62be07047e5d35e55bec51a5c73e20361fb40826c7c634

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d7b8762ff1e2a2cfc9789dcdb7c8f7ea8c7d372a5ced71912bbe65e23bb73fc3
MD5 645c442e2a18648be1b77fb4afcf56f3
BLAKE2b-256 0dd3bffd5ab9c484fb2cf24f6a35b490142081ec684531123f474d02630d9e76

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3c10c89ff2c172cb5974ba5da9bcb806a8ed561ce8eb5de31d5879d5a49929e2
MD5 410b49bbd1495b61b2d0dd2e35e479b2
BLAKE2b-256 80f4c19379b6d8923d623b59e2b1695549965c4e2f03a31319dd65551f03c0aa

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cyvcf2-0.30.12-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 29e6539609c1541b0559e15e28d8a164c36d4526143292d8e820e25748148fc8
MD5 fd06cd5cbb53f8f9cdce60d656c024a5
BLAKE2b-256 a9f4640b249e0e24ae3c9678bcb2858affa851855d60fe260699665990724835

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: cyvcf2-0.30.12-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 74ec17ae1004d5d155278710b12667f203123d0e386fa89fe221fb53cdf08c8d
MD5 7db1139ffe28f40098eda1d1dcf3d3db
BLAKE2b-256 d4738d20040d502063e8e5abc17146ef5e078f080273efe042c0c0ece5c560d8

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d8c4203662e7e5ee8a10ab3a7abb340b8f4d635975bd1640059fb9144e101842
MD5 7fb7dd83d9f3fe678eaff4eca65cb791
BLAKE2b-256 52fa94e96588ee709c830cf241f71f76060242c7666a48f80a8e96c22305ccee

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cyvcf2-0.30.12-cp38-cp38-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7094369fb55a97d5a09349fd38f5a8a973bb9613c95d69a6cf3fe2383140fba7
MD5 f4c281d84e96c1437b857aec64bd7423
BLAKE2b-256 77d0e4de4a997956c8cf9d45a9684d04e4d2b9ac14997753742014539e4abf50

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: cyvcf2-0.30.12-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 45390c94d3e4f0d4f313fc2f5696cb1eb3935bef580fe23beafceb4e5d5c9f40
MD5 10d02ec0c0efc955ec403066c9e11051
BLAKE2b-256 a54ff9b036523d19b11efa84b84325298f778f20accf829f1cf457ef10862336

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2001eaaf13df66149cbda593aff955c25ef42d95cd61d2f6217362ce6fce515e
MD5 14252afc5b47cedd3996a85b0608192f
BLAKE2b-256 be5b2ad52e24bfe5761ff042a2bcca482b103d2b61f6c9828f17dac1a0f52144

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: cyvcf2-0.30.12-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 93557a524ac9b0223c61f435a995b242697a296804c17217afe4d3c681572119
MD5 7333caa68bd777dd98cf514834893d64
BLAKE2b-256 a7ad1906884a44a77a43540040b9c95d82b17cbc511f05663e537542302ddf6d

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.12-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a54b41380fb29aacc3762296cf74aabaecf20fae77982d3cc99416d1d64c03e1
MD5 408c37957c6e0e2f1e9c0ff655ec620c
BLAKE2b-256 b3499c8ff1498d7c9da9145d4d189f6fbd0d5b387baad966a4bd4a9f3c92a903

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.12-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: cyvcf2-0.30.12-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cyvcf2-0.30.12-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5a24df4236620266d6e49f665f0dd09e8aaec7fb9d41772bde94fd322f412ff2
MD5 e26ee6cdee4ab5ae895d703b3f1f1e01
BLAKE2b-256 72aa2144b932eddf8b621783e5be55d93a8093ec79147102b444aa7c68859b28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page