Skip to main content

An ANTLR based parser for colloquial protein variant nomenclature

Project description

Protein Variant Nomenclature Parser

This repository contains a Python library for parsing and validating colloquial protein variant nomenclature strings like BRAF V600E that commonly appear in manuscripts.

Features

  • Parse protein variant nomenclature strings in the following formats:
    • Single amino acid substitution: "BRAF V600E", "BRAFV600E", "PTEN R130G", "TP53 R175H"
    • Range of amino acid substitutions: "BRAF V600_601E", "PTEN R130_131A", "TP53 R175_176N"
  • Extract the components of the nomenclature string, such as gene name, prefix amino acid, position or range, and suffix amino acid
  • Validate whether a given string conforms to the expected format

Supported Nomenclature

The parser supports all HUGO gene names.

The parser supports the following amino acid single letter codes and stop codon (*).

Installation

From PyPI

pip install protein-variant-nomenclature-parser

From Source

To install the library, clone the repository and install it using pip:

git clone https://github.com/yourusername/protein-variant-nomenclature-parser.git
cd protein-variant-nomenclature-parser
make install

Docker container

A docker container is available:

docker pull jeffquinnmsk/protein-variant-nomenclature-parser:latest

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page