Skip to main content

Python implementation of LVG Norm (https://lhncbc.nlm.nih.gov/LSG/Projects/lvg/current/docs/userDoc/tools/norm.html)

Project description

lvg_norm

Python implementation of LVG Norm (https://lhncbc.nlm.nih.gov/LSG/Projects/lvg/current/docs/userDoc/tools/norm.html).

This package focuses on the norm flow from the NLM Lexical Tools. It bundles the LVG-derived resources needed by the normalizer and exposes both a Python API and a small CLI.

What It Does

Given an input string, lvg_norm produces one or more normalized forms by applying an LVG-inspired pipeline. Two presets are available:

pipeline="medical" (default)

The full LVG-inspired flow, best for free-text English (MeSH/UMLS-style content):

q0 -> g -> rs -> o -> t -> l -> B -> Ct -> q7 -> q8 -> w

In practice, that means it handles things like:

  • Unicode folding
  • Possessive stripping
  • Parenthetic plural cleanup
  • Stopword removal
  • Lexicon/rule-based uninflection
  • Citation-form mapping
  • Final token sorting

The implementation is aimed at the norm tool behavior, not the full LVG suite.

pipeline="chemical"

For IUPAC names and small-molecule nomenclature, where punctuation is structural and word order is meaningful. The flow is reduced to:

q0 -> q7 -> q8 -> casefold + whitespace collapse

Genitives, parenthetic-plural removal, punctuation-to-space, stopword stripping, English uninflection/citation lookup, and the final token sort are all skipped. This preserves locants, hyphens, parens, brackets, primes, stereo descriptors, and substitution-position prefixes (N-, 1H-, D-, …). Greek letters are still expanded via the LVG nonStripMap (αalpha), but they remain glued to their parent name because the punctuation step is skipped. ± is mapped to +/- for the same reason.

Use pipeline="medical" for prose, pipeline="chemical" for chemistry names — the two are not designed to be mixed within a single call. If you can run names through OPSIN first, do that; reach for pipeline="chemical" when you need fuzzy matching on messy strings that OPSIN can't parse.

Install

From PyPI:

pip install lvg-norm

From the repository:

pip install .

For local development with uv:

uv sync --group dev

Python API

The distribution name is lvg-norm, while the Python import package is lvg_norm.

from lvg_norm import NormNormalizer, lvg_normalize

lvg_normalize("β-lactam antibiotics")
# ['antibiotic beta lactam']

normer = NormNormalizer(max_combinations=5)
normer.normalize("HNF1A p.Q125*")
# ['hnf1a p q125', 'hnf1on p q125', 'hnf1um p q125']

# Chemistry preset for IUPAC / small-molecule names
lvg_normalize("(2S,3R)-2,3-dihydro-1H-indole", pipeline="chemical")
# ['(2s,3r)-2,3-dihydro-1h-indole']
lvg_normalize("β-lactam antibiotics", pipeline="chemical")
# ['beta-lactam antibiotics']
lvg_normalize("(±)-tartaric acid", pipeline="chemical")
# ['(+/-)-tartaric acid']

CLI

The package installs a lvg-norm command:

lvg-norm "β-lactam antibiotics"
lvg-norm --file inputs.txt
echo "HNF1A p.Q125*" | lvg-norm

Useful flags:

  • --pipeline {medical,chemical} to pick the preset (default: medical)
  • --stopwords PATH to provide an extra stopword list
  • --no-lvg-stopwords to disable the bundled LVG stopword list
  • --max-combinations N to cap variant expansion

Development

uv sync --group dev
pytest
ruff check .
ruff format --check .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lvg_norm-1.3.0.tar.gz (13.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lvg_norm-1.3.0-py3-none-any.whl (13.3 MB view details)

Uploaded Python 3

File details

Details for the file lvg_norm-1.3.0.tar.gz.

File metadata

  • Download URL: lvg_norm-1.3.0.tar.gz
  • Upload date:
  • Size: 13.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lvg_norm-1.3.0.tar.gz
Algorithm Hash digest
SHA256 e9341e8516f9472f22d9371a1c5fdac129fb009beb489c5d13b89ca6283ccbde
MD5 848496ad65ab109df28f2067a46e31a2
BLAKE2b-256 6fa29ca4520561a13ff6721f976efee585a56d1f8f1780cb1dbf8f95d438a7aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for lvg_norm-1.3.0.tar.gz:

Publisher: python-publish.yml on haydn-jones/lvg_norm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lvg_norm-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: lvg_norm-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lvg_norm-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7e0334bcae9cea86b642f1d00427ce09ceb77d99356fdbfc27ef37448704b8a
MD5 b9830af624e52de2037596377a9c02b6
BLAKE2b-256 cb3161921deed2c238234bb5402ca3e2fb8f5c447977168b9df56028725bc7dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for lvg_norm-1.3.0-py3-none-any.whl:

Publisher: python-publish.yml on haydn-jones/lvg_norm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page