Skip to main content

VCF to .csv handler catered to specific desired fields

Project description

vcf-handler

This repo is an installable python package and command line tool built for creating .csv files of annotated variants from VCF files. Currently the main process annotates variants with the following information either found within the VCF or pulled from external sources:

  1. Depth of sequence coverage at the site of variation.
  2. Number of reads supporting the variant.
  3. Percentage of reads supporting the variant versus those supporting reference reads.
  4. Gene ID of the variant, type of variation (substitution, insertion, CNV, etc.) and their effect (missense, silent, intergenic, etc.) using the VEP hgvs API
  5. The minor allele frequency of the variant if available.

This process supports handling of multi-allelic sites. No pre-decomposition needed.

This package is publicly installable from PyPI. Once installed, the vcf-handler can be ran by importing the installed package:

>>> from vcf_handler.process import process_vcf
>>> process_vcf('test_vcf_data.txt')
INFO:vcf-handling:Checking VCF file
INFO:vcf-handling:Writing annotated variants to output.csv

Or, if the repo is cloned to your local environment and you have PDM installed, running pdm install from the root directory will install all dependencies, allowing you to run the tooling from the command line:

pdm run vcf-handler -i "{path_to_vcf}" -o "{desired_path_out}"

Code Walkthrough

The main VCF to CSV runner in this package is process.py. Here we pass through a light VCF file formatting check prior to reading in our variants. Reading and writing of variants is managed through generators in order to allow easy scaling in the instance of VCF files that are multiple GBs in size. This low-memory reading and writing can prevent exceeding of resource caps in comparison to methods which read the entire file into memory as a bytes, strings, or dataframes. All reading / writing is managed in utils/read_write.py

Once read in, each variant line is cast to a custom Variant class (utils/Variant.py) which has a handful of operations performed on it in order to scrape the necessary annotations. These are performed as class methods, and occasionally rely on outside helper functions (utils/vep_helpers.py).

The command line interfacing is managed through the click and argparse modules, and is all handled in cli.py

Developing

This repo uses PDM. Install PDM and then install dependencies with pdm install.

Running test suite: pdm run test Running auto-linter: pdm run lint-fix

Releases

This package is published on PyPI. In order to create a new release, bump the version in the pyproject.toml file, create a PR, and merge that change into main. When that change is merged into main, the new version will be automatically recognized and published.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gabry-vcf-handler-1.3.0.tar.gz (694.3 kB view details)

Uploaded Source

Built Distribution

gabry_vcf_handler-1.3.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file gabry-vcf-handler-1.3.0.tar.gz.

File metadata

  • Download URL: gabry-vcf-handler-1.3.0.tar.gz
  • Upload date:
  • Size: 694.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for gabry-vcf-handler-1.3.0.tar.gz
Algorithm Hash digest
SHA256 2004b9f6927a543187dff02a6217bff9aac1a13801f9f7e648705ea003d82289
MD5 a4b7ae3ff7dc12b946d10f819cc4532d
BLAKE2b-256 933fd8c76f4b77fe9e783884e7ec80b5dc45cc1dfc7305dfd5e0bbbb907c1639

See more details on using hashes here.

File details

Details for the file gabry_vcf_handler-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for gabry_vcf_handler-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c16f88fa7eaae6c9191f1db04ad5d5fbb394b2c99feaa7f516e26d1f3aae925a
MD5 11714fa22643b11845dbb8c6e1f1ff32
BLAKE2b-256 9f1a3da1a8c5447f20e5a49b071fbf7192c1685df75c6968c5e1fd764b5a3477

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page