Skip to main content

Python 3 VCF library with good support for both reading and writing

Project description

VCFPy

https://img.shields.io/pypi/v/vcfpy.svg https://img.shields.io/travis/bihealth/vcfpy.svg Documentation Status Codacy Analysis Codacy Coverage Landscape Health Publication in The Journal of Open Source Software

Python 3 VCF library with good support for both reading and writing

Features

  • Support for reading and writing VCF v4.3

  • Interface to INFO and FORMAT fields is based on OrderedDict allows for easier modification than PyVCF (also I find this more pythonic)

  • Read (and jump in) and write BGZF files just using vcfpy

Why another VCF parser for Python!

I’ve been using PyVCF with quite some success in the past. However, the main bottleneck of PyVCF is when you want to modify the per-sample genotype information. There are some issues in the tracker of PyVCF but none of them can really be considered solved. I tried several hours to solve these problems within PyVCF but this never got far or towards a complete rewrite…

For this reason, VCFPy was born and here it is!

What’s the State?

VCFPy is the result of two full days of development plus some maintenance work later now (right now). I’m using it in several projects but it is not as battle-tested as PyVCF.

Why Python 3 Only?

As I’m only using Python 3 code, I see no advantage in carrying around support for legacy Python 2 and maintaining it. At a later point when VCFPy is known to be stable, Python 2 support might be added if someone contributes a pull request.

History

v0.11.0 (2017-11-22)

  • The field FORMAT/FT is now expected to be a semicolon-separated string. Internally, we will handle it as a list.

  • Switching from warning helper utility code to Python warnings module.

  • Return str in case of problems with parsing value.

v0.10.0 (2017-02-27)

  • Extending API to allow for reading subsets of records. (Writing for sample subsets or reordered samples is possible through using the appropriate names list in the SamplesInfos for the Writer).

  • Deep-copying header lines and samples infos on Writer construction

  • Using samples attribute from Header in Reader and Writer instead of passing explicitely

0.9.0 (2017-02-26)

  • Restructuring of requirements.txt files

  • Fixing parsing of no-call GT fields

0.8.1 (2017-02-08)

  • PEP8 style adjustments

  • Using versioneer for versioning

  • Using requirements*.txt files now from setup.py

  • Fixing dependency on cyordereddict to be for Python <3.6 instead of <3.5

  • Jumping by samtools coordinate string now also allowed

0.8.0 (2016-10-31)

  • Adding Header.has_header_line for querying existence of header line

  • Header.add_*_line return a bool no indicating any conflicts

  • Construction of Writer uses samples within header and no extra parameter (breaks API)

0.7.0 (2016-09-25)

  • Smaller improvements and fixes to documentation

  • Adding Codacy coverage and static code analysis results to README

  • Various smaller code cleanup triggered by Codacy results

  • Adding __eq__, __neq__ and __hash__ to data types (where applicable)

0.6.0 (2016-09-25

  • Refining implementation for breakend and symbolic allele class

  • Removing record.SV_CODES

  • Refactoring parser module a bit to make the code cleaner

  • Fixing small typos and problems in documentation

0.5.0 (2016-09-24)

  • Deactivating warnings on record parsing by default because of performance

  • Adding validation for INFO and FORMAT fields on reading (#8)

  • Adding predefined INFO and FORMAT fields to pyvcf.header (#32)

0.4.1 (2016-09-22)

  • Initially enabling codeclimate

0.4.0 (2016-09-22)

  • Exporting constants for encoding variant types

  • Exporting genotype constants HOM_REF, HOM_ALT, HET

  • Implementing Call.is_phased, Call.is_het, Call.is_variant, Call.is_phased, Call.is_hom_ref, Call.is_hom_alt

  • Removing Call.phased (breaks API, next release is 0.4.0)

  • Adding tests, fixing bugs for methods of Call

0.3.1 (2016-09-21)

  • Work around FORMAT/FT being a string; this is done so in the Delly output

0.3.0 (2016-09-21)

  • Reader and Writer can now be used as context manager (with with)

  • Including license in documentation, including Biopython license

  • Adding support for writing bgzf files (taken from Biopython)

  • Adding support for parsing arrays in header lines

  • Removing example-4.1-bnd.vcf example file because v4.1 tumor derival lacks ID field

  • Adding AltAlleleHeaderLine, MetaHeaderLine, PedigreeHeaderLine, and SampleHeaderLine

  • Renaming SimpleHeaderFile to SimpleHeaderLine

  • Warn on missing FILTER entries on parsing

  • Reordered parameters in from_stream and from_file (#18)

  • Renamed from_file to from_stream (#18)

  • Renamed Reader.jump_to to Reader.fetch

  • Adding header_without_lines function

  • Generally extending API to make it esier to use

  • Upgrading dependencies, enabling pyup-bot

  • Greatly extending documentation

0.2.1 (2016-09-19)

  • First release on PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcfpy-0.11.0.tar.gz (1.0 MB view details)

Uploaded Source

File details

Details for the file vcfpy-0.11.0.tar.gz.

File metadata

  • Download URL: vcfpy-0.11.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for vcfpy-0.11.0.tar.gz
Algorithm Hash digest
SHA256 c23a9dbd6f1ad3db737f68aa3d313bf9e6e1a91c96a8d3f80523e61055502a67
MD5 e2d1c7d7c27da4f6237889bc07973d07
BLAKE2b-256 59d94b88953a0e4b255378ffbfdf90e872034ffdb4a92d6b7adf097550527832

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page