Skip to main content

Abstract PyHGVS and Biocommons HGVS libraries

Project description

hgvs_shim

hgvs-shim is a small compatibility layer that simplifies migrating code from the Counsyl pyhgvs library to Biocommons HGVS.

Background

There are 2 Python HGVS libraries:

  • pyhgvs — simpler, but appears abandoned (no activity for years)
  • Biocommons HGVS — actively developed, more features (alignment gaps, inversions, uncertain coordinates)

Motivation

In the VariantGrid project, we initially started with pyhgvs. We eventually decided Biocommons HGVS was worth the added complexity and wrote cdot for providing transcripts to both libraries.

Migrating was risky: pyhgvs classes were spread throughout our codebase, and we needed to validate that Biocommons would produce identical results before switching. The transition steps were:

  1. Abstract away pyhgvs classes and methods behind our own interfaces
  2. Write implementations for both pyhgvs and Biocommons
  3. Run both in parallel during testing, raising errors if they disagreed
  4. Switch to Biocommons-only once confident

hgvs_shim is a cleaned-up library from this work.

Installation

# Biocommons only (pair with any hdp: UTA, cdot, etc.)
pip install hgvs_shim

# With pyhgvs support (for running both backends side-by-side during migration)
pip install 'hgvs_shim[pyhgvs]'

Usage

Setting up the converter

BioCommonsHGVSConverter takes any biocommons data provider (hgvs.dataproviders.interface.Interface). The simplest option is the default biocommons UTA database connection:

import hgvs.dataproviders.uta as uta
from hgvs_shim import BioCommonsHGVSConverter

hdp = uta.connect()  # connects to biocommons' public UTA PostgreSQL instance
converter = BioCommonsHGVSConverter('GRCh38', hdp)

For faster or offline transcript data, cdot provides JSON.gz files and a REST API (pip install 'hgvs_shim[cdot]'):

from cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider
from hgvs_shim import BioCommonsHGVSConverter

# Local JSON.gz file (fast: 500-1000 tx/sec)
hdp = JSONDataProvider(['/path/to/cdot-0.2.x.refseq.grch37.json.gz'])
converter = BioCommonsHGVSConverter('GRCh37', hdp)

# cdot.cc REST API (no local file needed, ~10 tx/sec)
hdp = RESTDataProvider()
converter = BioCommonsHGVSConverter('GRCh37', hdp)

HGVS string → variant coordinate

chrom, pos, ref, alt = converter.hgvs_to_variant_coordinate('NM_000492.3:c.1521_1523delCTT')
# ('chr7', 117548628, 'ACTT', 'A')

Variant coordinate → c.HGVS

from hgvs_shim import TranscriptInfo

transcript_info = TranscriptInfo(
    accession='NM_000492.3',
    strand='+',
    is_coding=True,
    gene_symbol='CFTR',  # optional
)

variant = converter.variant_coordinate_to_c_hgvs('chr7', 117548628, 'ACTT', 'A', transcript_info)
print(variant.format())
# NM_000492.3:c.1521_1523del

Normalize a variant

variant = converter.create_hgvs_variant('NM_000492.3:c.1521_1523delCTT')
normalized = converter.normalize(variant)
print(normalized.format())

Running both backends in parallel (migration validation)

from cdot.hgvs.dataproviders import JSONDataProvider
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory
from pysam import FastaFile
from hgvs_shim import BioCommonsHGVSConverter, PyHGVSConverter, ComboCheckerHGVSConverter

hdp = JSONDataProvider(['/path/to/cdot.json.gz'])
biocommons = BioCommonsHGVSConverter('GRCh37', hdp)

factory = JSONPyHGVSTranscriptFactory(['/path/to/cdot.json.gz'])
pyhgvs_conv = PyHGVSConverter(FastaFile('/path/to/GRCh37.fa'), factory.get_transcript_grch37)

# Raises ValueError if the two converters return different results
combo = ComboCheckerHGVSConverter([biocommons, pyhgvs_conv], die_on_error=True)
result = combo.hgvs_to_variant_coordinate('NM_000352.3:c.215A>G')

Exception handling

from hgvs_shim import HGVSNomenclatureException, HGVSImplementationException

try:
    result = converter.hgvs_to_variant_coordinate(hgvs_string)
except HGVSNomenclatureException:
    # Bad HGVS string — user error, fixable
    ...
except HGVSImplementationException:
    # Library failure — not user-fixable
    ...

Format differences vs pyhgvs

Biocommons follows the modern HGVS specification, which differs from pyhgvs in two ways:

del/dup notation — biocommons omits the deleted/duplicated sequence:

pyhgvs:     NM_000492.3:c.442delA
biocommons: NM_000492.3:c.442del

delins VCF representation — biocommons uses minimal representation (no anchor base):

pyhgvs:     ('chr7', 117171119, 'CA', 'C')   # anchor base included
biocommons: ('chr7', 117171120, 'A',  'C')   # no anchor base

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hgvs_shim-0.1.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hgvs_shim-0.1.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file hgvs_shim-0.1.0.tar.gz.

File metadata

  • Download URL: hgvs_shim-0.1.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hgvs_shim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 15b441cc4d986bd71d0d14d055d30d6e22d4ff1d622a54ff5ff54b7cb66e9908
MD5 f05fde1d708c9a0bad8b2f360931230f
BLAKE2b-256 70baaf42ed2b05923becfa20f29fe09dfb0ab2b569787e2ff494d9dde5465b6d

See more details on using hashes here.

File details

Details for the file hgvs_shim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hgvs_shim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hgvs_shim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33703a8d57952c7e194640ced23c79d782cc63c53bfd682d87b592e6146335ac
MD5 de9d1d4e4117a9d7636f9adbd7769425
BLAKE2b-256 150623e2db09bb5e5409e46901108c65790dcb605e35bf598ecd8f278b864c94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page