Skip to main content

An ANTLR based parser for colloquial protein variant nomenclature

Project description

Protein Variant Nomenclature Parser

This repository contains a Python library for parsing and validating colloquial protein variant nomenclature strings like BRAF V600E that commonly appear in manuscripts.

Features

  • Parse protein variant nomenclature strings in the following formats:
    • Single amino acid substitution, e.g.: BRAF V600E, BRAFV600E, BRAFᵛ⁶⁰⁰ᵉ
    • Range of amino acid substitutions: BRAFVK600_601>E
  • Extract the components of the nomenclature string, such as gene name, prefix amino acid, position or range, and suffix amino acid
  • Validate whether a given string conforms to the expected format
  • Pure python with zero dependencies and no dependency on an internet connection

Usage

For parsing:

from protein_variant_nomenclature_parser.parser import parse

mutation_string = "BRAF V600E"
parsed_components = parse(mutation_string)

print(parsed_components)
ProteinVariant(gene='BRAF', ref='V', position=NumberOrRange(start=600, end=None), alt='E')

For validation:

from protein_variant_nomenclature_parser.parser import parse
from protein_variant_nomenclature_parser.parser import InvalidProteinVariantError


mutation_string = "INVALID V600E"

try:
    parse(mutation_string)
except InvalidProteinVariantError:
    print(f"{mutation_string} is not valid")

Supported Nomenclature

The parser supports all HUGO gene names.

The parser supports the following amino acid single letter codes and stop codon (*).

The parser supports situations where the variant has no space between the gene name in the substitution, which unfortunately comes up sometimes.

Installation

From PyPI

pip install protein-variant-nomenclature-parser

From Source

To install the library, clone the repository and install it using pip:

git clone https://github.com/yourusername/protein-variant-nomenclature-parser.git
cd protein-variant-nomenclature-parser
make install

Docker container

A docker container is available:

docker pull jeffquinnmsk/protein-variant-nomenclature-parser:latest

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file protein-variant-nomenclature-parser-0.5.0.tar.gz.

File metadata

File hashes

Hashes for protein-variant-nomenclature-parser-0.5.0.tar.gz
Algorithm Hash digest
SHA256 982d690d2da32757897c7501bd51d2a12e04c880e73a5f4d88d70cd6102959ae
MD5 bfd709203e21f6e5a99a9a3532822452
BLAKE2b-256 c3965a528b6e52b73681c46f8d1f2743838b92447a7245f6d7aada071c710792

See more details on using hashes here.

File details

Details for the file protein_variant_nomenclature_parser-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for protein_variant_nomenclature_parser-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11f866cbb8e30fcb5b1e092ccf640b1081d876036fba06453dc4057babdad9b2
MD5 b2287e88014150e2c92441c1402b82e0
BLAKE2b-256 c9f89f0ec776e011c58f5c0b65d307c6a5ffb858993fdd12ed725720330c649e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page