Python implementation of LVG Norm (https://lhncbc.nlm.nih.gov/LSG/Projects/lvg/current/docs/userDoc/tools/norm.html)
Project description
lvg_norm
Python implementation of LVG Norm (https://lhncbc.nlm.nih.gov/LSG/Projects/lvg/current/docs/userDoc/tools/norm.html).
This package focuses on the norm flow from the NLM Lexical Tools. It bundles
the LVG-derived resources needed by the normalizer and exposes both a Python API
and a small CLI.
What It Does
Given an input string, lvg_norm produces one or more normalized forms by
applying an LVG-inspired pipeline. Two presets are available:
pipeline="medical" (default)
The full LVG-inspired flow, best for free-text English (MeSH/UMLS-style content):
q0 -> g -> rs -> o -> t -> l -> B -> Ct -> q7 -> q8 -> w
In practice, that means it handles things like:
- Unicode folding
- Possessive stripping
- Parenthetic plural cleanup
- Stopword removal
- Lexicon/rule-based uninflection
- Citation-form mapping
- Final token sorting
The implementation is aimed at the norm tool behavior, not the full LVG
suite.
pipeline="chemical"
For IUPAC names and small-molecule nomenclature, where punctuation is structural and word order is meaningful. The flow is reduced to:
q0 -> q7 -> q8 -> casefold + whitespace collapse
Genitives, parenthetic-plural removal, punctuation-to-space, stopword
stripping, English uninflection/citation lookup, and the final token sort
are all skipped. This preserves locants, hyphens, parens, brackets, primes,
stereo descriptors, and substitution-position prefixes (N-, 1H-, D-,
…). Greek letters are still expanded via the LVG nonStripMap (α →
alpha), but they remain glued to their parent name because the
punctuation step is skipped. ± is mapped to +/- for the same reason.
Use pipeline="medical" for prose, pipeline="chemical" for chemistry
names — the two are not designed to be mixed within a single call. If you
can run names through OPSIN first, do that; reach for pipeline="chemical"
when you need fuzzy matching on messy strings that OPSIN can't parse.
Install
From PyPI:
pip install lvg-norm
From the repository:
pip install .
For local development with uv:
uv sync --group dev
Python API
The distribution name is lvg-norm, while the Python import package is
lvg_norm.
from lvg_norm import NormNormalizer, lvg_normalize
lvg_normalize("β-lactam antibiotics")
# ['antibiotic beta lactam']
normer = NormNormalizer(max_combinations=5)
normer.normalize("HNF1A p.Q125*")
# ['hnf1a p q125', 'hnf1on p q125', 'hnf1um p q125']
# Chemistry preset for IUPAC / small-molecule names
lvg_normalize("(2S,3R)-2,3-dihydro-1H-indole", pipeline="chemical")
# ['(2s,3r)-2,3-dihydro-1h-indole']
lvg_normalize("β-lactam antibiotics", pipeline="chemical")
# ['beta-lactam antibiotics']
lvg_normalize("(±)-tartaric acid", pipeline="chemical")
# ['(+/-)-tartaric acid']
CLI
The package installs a lvg-norm command:
lvg-norm "β-lactam antibiotics"
lvg-norm --file inputs.txt
echo "HNF1A p.Q125*" | lvg-norm
Useful flags:
--pipeline {medical,chemical}to pick the preset (default:medical)--stopwords PATHto provide an extra stopword list--no-lvg-stopwordsto disable the bundled LVG stopword list--max-combinations Nto cap variant expansion
Development
uv sync --group dev
pytest
ruff check .
ruff format --check .
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lvg_norm-1.3.0.tar.gz.
File metadata
- Download URL: lvg_norm-1.3.0.tar.gz
- Upload date:
- Size: 13.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9341e8516f9472f22d9371a1c5fdac129fb009beb489c5d13b89ca6283ccbde
|
|
| MD5 |
848496ad65ab109df28f2067a46e31a2
|
|
| BLAKE2b-256 |
6fa29ca4520561a13ff6721f976efee585a56d1f8f1780cb1dbf8f95d438a7aa
|
Provenance
The following attestation bundles were made for lvg_norm-1.3.0.tar.gz:
Publisher:
python-publish.yml on haydn-jones/lvg_norm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lvg_norm-1.3.0.tar.gz -
Subject digest:
e9341e8516f9472f22d9371a1c5fdac129fb009beb489c5d13b89ca6283ccbde - Sigstore transparency entry: 1391481561
- Sigstore integration time:
-
Permalink:
haydn-jones/lvg_norm@7db6ae8fd926ee8fa6b2ea0dfb07c4a877e15980 -
Branch / Tag:
refs/tags/v1.3.0 - Owner: https://github.com/haydn-jones
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7db6ae8fd926ee8fa6b2ea0dfb07c4a877e15980 -
Trigger Event:
release
-
Statement type:
File details
Details for the file lvg_norm-1.3.0-py3-none-any.whl.
File metadata
- Download URL: lvg_norm-1.3.0-py3-none-any.whl
- Upload date:
- Size: 13.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7e0334bcae9cea86b642f1d00427ce09ceb77d99356fdbfc27ef37448704b8a
|
|
| MD5 |
b9830af624e52de2037596377a9c02b6
|
|
| BLAKE2b-256 |
cb3161921deed2c238234bb5402ca3e2fb8f5c447977168b9df56028725bc7dd
|
Provenance
The following attestation bundles were made for lvg_norm-1.3.0-py3-none-any.whl:
Publisher:
python-publish.yml on haydn-jones/lvg_norm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lvg_norm-1.3.0-py3-none-any.whl -
Subject digest:
a7e0334bcae9cea86b642f1d00427ce09ceb77d99356fdbfc27ef37448704b8a - Sigstore transparency entry: 1391481568
- Sigstore integration time:
-
Permalink:
haydn-jones/lvg_norm@7db6ae8fd926ee8fa6b2ea0dfb07c4a877e15980 -
Branch / Tag:
refs/tags/v1.3.0 - Owner: https://github.com/haydn-jones
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7db6ae8fd926ee8fa6b2ea0dfb07c4a877e15980 -
Trigger Event:
release
-
Statement type: