Standalone Python client for the SRI Node Normalizer and Name Resolver APIs. Originally extracted from RTX-ARAX.

These details have not been verified by PyPI

Project links

Project description

node-synonymizer

standalone Python client for the SRI Node Normalizer (https://nodenormalization-sri.renci.org) and Name Resolver (https://name-resolution-sri.renci.org) services. it provides CURIE canonicalization, equivalent-node lookup, and free-text name-to-CURIE mapping for Biolink-typed entities.

originally extracted from RTX-ARAX (https://github.com/RTXteam/RTX), branch issue-2585, file code/ARAX/NodeSynonymizer/node_synonymizer.py.

what it does

node normalization: map any CURIE to its preferred/canonical CURIE (e.g. DOID:14330 -> MONDO:0005180).
name resolution: resolve a free-text name or label to a canonical CURIE (e.g. "Parkinson's disease" -> MONDO:0005180).
equivalent-node lookup: expand a CURIE into the full cluster of equivalent identifiers across ontologies (DOID, MONDO, UMLS, MESH, CHEBI, NCBIGene, DRUGBANK, CHEMBL, ...).
Biolink category lookup: get the most specific Biolink category for a CURIE (e.g. biolink:Disease, biolink:SmallMolecule, biolink:Gene), or the full distribution of categories across the cluster.
full normalizer metadata: retrieve the complete SRI normalizer result — canonical id, all categories with counts, equivalent nodes, and a TRAPI-shaped knowledge_graph per cluster.
batching + caching: every method accepts a list of inputs and batches them into a single API call; Node Normalizer responses are cached in-memory per instance.
optional async: concurrent batching via aiohttp for bulk workloads (use_async=True).

typical use cases

mapping biomedical identifiers across ontologies for downstream analysis.
resolving natural-language disease / drug / gene names to canonical CURIEs.
building TRAPI-shaped cluster graphs from a CURIE or a name.
preprocessing input for Translator tools (Pathfinder, ARA/KP services, NER pipelines).

install

pip install node-synonymizer

requires python 3.12.

pinned versions

runtime deps are pinned to the exact versions used by RTX at extraction time, so the install is reproducible and consumers get the same tree ARAX runs today:

package	version
python	3.12
aiohttp	3.9.4
bmt	1.4.6
pandas	2.3.1
requests	2.32.3

dev extras:

package	version
pytest	8.1.1

quick usage

canonicalize a CURIE

from node_synonymizer import NodeSynonymizer

syn = NodeSynonymizer()
print(syn.get_canonical_curies("DOID:14330"))
# {'DOID:14330': {'preferred_curie': 'MONDO:0005180',
#                 'preferred_name': 'Parkinson disease',
#                 'preferred_category': 'biolink:Disease'}}

resolve a free-text name

syn = NodeSynonymizer()
print(syn.get_canonical_curies(names="Parkinson's disease"))
# {"Parkinson's disease": {'preferred_curie': 'MONDO:0005180',
#                          'preferred_name': 'Parkinson disease',
#                          'preferred_category': 'biolink:Disease'}}

batch both at once

syn = NodeSynonymizer()
print(syn.get_canonical_curies(
    curies=["DOID:14330", "CHEMBL.COMPOUND:CHEMBL112"],
    names=["Warfarin"],
))

get all equivalent CURIEs in the cluster

syn = NodeSynonymizer()
equivalents = syn.get_equivalent_nodes("DOID:14330")
print(equivalents["DOID:14330"])
# ['MONDO:0005180', 'DOID:14330', 'OMIM.PS:168600', 'UMLS:C0030567', 'MESH:D010300', ...]

look up Biolink category

syn = NodeSynonymizer()
print(syn.get_curie_category("CHEMBL.COMPOUND:CHEMBL112"))
# {'CHEMBL.COMPOUND:CHEMBL112': 'biolink:SmallMolecule'}

full normalizer result

syn = NodeSynonymizer()
result = syn.get_normalizer_results("PTGS1")
# result["PTGS1"] has: id, categories, nodes, total_synonyms, knowledge_graph

public API

method	purpose
`NodeSynonymizer(sqlite_file_name=None, autocomplete=True, use_async=False)`	construct a client. `sqlite_file_name` is accepted for backwards compat and ignored. `autocomplete=False` switches Name Resolver to exact-phrase matching. `use_async=True` enables concurrent batching via aiohttp.
`get_canonical_curies(curies=None, names=None, return_all_categories=False, debug=False) -> dict`	canonical CURIE + preferred name + preferred category for each input (CURIE or name). with `return_all_categories=True`, also returns a `{category: count}` map across the cluster.
`get_equivalent_nodes(curies=None, names=None, include_unrecognized_entities=True, debug=False) -> dict`	list of equivalent CURIEs in the same SRI cluster for each input.
`get_preferred_names(curies, debug=False) -> dict`	preferred (canonical) name for each input CURIE.
`get_curie_names(curies, debug=False) -> dict`	the direct (non-preferred) name attached to the input CURIE itself, not its canonical form.
`get_curie_category(curies, debug=False) -> dict`	most specific Biolink category for each input.
`get_distinct_category_list(debug=False) -> list`	all Biolink category names the class knows about (derived from `bmt`).
`get_normalizer_results(entities, max_synonyms=1000000, debug=False) -> dict`	full SRI normalizer result per input: canonical id, categories with counts, equivalent nodes with metadata, and a TRAPI-shaped `knowledge_graph`.
`get_cache_stats() -> dict`	in-memory cache hits / misses / hit rate for debugging.

all methods accept a single string, a list, or a set of inputs. returns are keyed by the original input string. missing / unresolved inputs map to None.

endpoints

URLs are hardcoded to the RENCI-hosted SRI services:

Node Normalizer: https://nodenormalization-sri.renci.org/1.5
Name Resolver: https://name-resolution-sri.renci.org

the transltr.io production mirror of Name Resolver does NOT expose /bulk-lookup, which is one reason we pin to RENCI. see the NodeSynonymizer class docstring for the full rationale.

to override (e.g. to point at CI/dev), subclass and set the two class constants:

class MySynonymizer(NodeSynonymizer):
    NODE_NORMALIZER_URL = "https://nodenorm.ci.transltr.io/1.5"
    NAME_RESOLVER_URL = "https://name-lookup.ci.transltr.io"

changes from RTX-ARAX

the source file on issue-2585 was written to run inside the ARAX tree. to make it standalone without changing behavior, the following edits were applied:

dropped the sys.path.append block at the top of the file. it existed because the ARAX repo doesn't use a standard Python package layout. the new src/node_synonymizer/ layout makes it unnecessary.
dropped the three imports from openapi_server.models (KnowledgeGraph, Node, Attribute). those are ARAX-generated TRAPI pydantic classes that don't exist outside the ARAX tree, so they were a hard blocker for a standalone install.
replaced the TRAPI model constructors in _get_cluster_graph and _convert_to_trapi_node with plain dict literals. the dicts preserve the full field set and default values that the original openapi_server.models.*.to_dict() emitted:
- Node dicts keep all 4 fields (name, categories, attributes, is_set=False).
- Attribute dicts keep all 8 fields, with None for unset optionals (original_attribute_name, value_url, description when not passed, attributes).
- the helper _attribute_dict(...) centralizes the Attribute shape.
output is byte-for-byte identical to what ARAX produces today, verified by the smoke run below.
dropped unused os / sys imports that only existed for the sys.path block.
no RTXConfiguration reference. the two SRI URLs live as hardcoded class constants (NODE_NORMALIZER_URL, NAME_RESOLVER_URL). override by subclassing, see above.

nothing else in the class body was touched. the public API (get_canonical_curies, get_equivalent_nodes, get_curie_names, get_preferred_names, get_normalizer_results) is identical.

why not reasoner-pydantic, why not copy ARAX's openapi_server files

two other options we considered for item 3:

reasoner-pydantic (https://github.com/TranslatorSRI/reasoner-pydantic), the Translator TRAPI pydantic lib. pydantic v2's model_dump() doesn't produce the same dict shape as openapi-generator's .to_dict() — aliases, enums, and None-handling all differ — so switching would change the output vs ARAX.
copy the ~5 openapi_server model files from RTX directly into this package. RTX is MIT so that's fine. same output because it's the same code. downside: 5 auto-generated files to carry, a six dep, and a resync every time ARAX regenerates them.

we went with plain dicts. this package is a thin client: given a CURIE or a name, return synonyms. whether the output is a validated TRAPI instance is the caller's problem. Pathfinder, DrugBankNER, and ARAX each revalidate at their own boundary anyway, so adding pydantic here just duplicates what they already do.

tests

25 live tests adapted from RTX's code/ARAX/test/test_ARAX_synonymizer.py. the only change was dropping the ARAX sys.path hack; the test bodies are identical. they hit the real SRI endpoints at RENCI, so they need network access. a handful will fail if RENCI is down.

pip install -e '.[dev]'
pytest -v

coverage: 5 legacy tests (timing-style walks over a set of CURIEs / names) + 20 assertion-driven tests covering canonical-curie lookup, equivalent-node lookup by curie and by name, mixed-input lookup, unknown-CURIE handling, improper prefix capitalization, approximate name matching, normalizer-result shape, cluster graph shape, cluster truncation, entity-controller input formats.

last run on this machine (python 3.12.11, pytest 8.1.1):

============================= test session starts ==============================
platform darwin -- Python 3.12.11, pytest-8.1.1, pluggy-1.6.0
rootdir: /Users/bazarkua/Work/node-synonymizer
configfile: pyproject.toml
collected 25 items

tests/test_node_synonymizer.py::test_example_6b PASSED                   [  4%]
tests/test_node_synonymizer.py::test_example_9 PASSED                    [  8%]
tests/test_node_synonymizer.py::test_example_10 PASSED                   [ 12%]
tests/test_node_synonymizer.py::test_example_11 PASSED                   [ 16%]
tests/test_node_synonymizer.py::test_example_12 PASSED                   [ 20%]
tests/test_node_synonymizer.py::test_get_canonical_curies_simple PASSED  [ 24%]
tests/test_node_synonymizer.py::test_get_canonical_curies_single_curie PASSED [ 28%]
tests/test_node_synonymizer.py::test_get_canonical_curies_unrecognized PASSED [ 32%]
tests/test_node_synonymizer.py::test_get_canonical_curies_by_names PASSED [ 36%]
tests/test_node_synonymizer.py::test_get_canonical_curies_single_name PASSED [ 40%]
tests/test_node_synonymizer.py::test_get_canonical_curies_by_names_and_curies PASSED [ 44%]
tests/test_node_synonymizer.py::test_get_canonical_curies_return_all_categories PASSED [ 48%]
tests/test_node_synonymizer.py::test_get_equivalent_nodes PASSED         [ 52%]
tests/test_node_synonymizer.py::test_get_equivalent_nodes_by_name PASSED [ 56%]
tests/test_node_synonymizer.py::test_bad_name PASSED                     [ 60%]
tests/test_node_synonymizer.py::test_get_equivalent_nodes_by_curies_and_names PASSED [ 64%]
tests/test_node_synonymizer.py::test_get_curie_names PASSED              [ 68%]
tests/test_node_synonymizer.py::test_get_preferred_names PASSED          [ 72%]
tests/test_node_synonymizer.py::test_get_normalizer_results PASSED       [ 76%]
tests/test_node_synonymizer.py::test_improper_curie_prefix_capitalization PASSED [ 80%]
tests/test_node_synonymizer.py::test_approximate_name_based_matching PASSED [ 84%]
tests/test_node_synonymizer.py::test_entity_controller_input_no_format PASSED [ 88%]
tests/test_node_synonymizer.py::test_entity_controller_input_minimal_format PASSED [ 92%]
tests/test_node_synonymizer.py::test_cluster_graphs PASSED               [ 96%]
tests/test_node_synonymizer.py::test_truncate_cluster PASSED             [100%]

============================= 25 passed in 32.52s ==============================

smoke run

same python, hitting the live RENCI endpoints:

from node_synonymizer import NodeSynonymizer
import json

syn = NodeSynonymizer()

print(json.dumps(syn.get_canonical_curies("DOID:14330"), indent=2))

print(json.dumps(syn.get_canonical_curies(names="Parkinson's disease"), indent=2))

eq = syn.get_equivalent_nodes("DOID:14330")
print(f"equivalents for DOID:14330: {len(eq['DOID:14330'])} nodes")
print(f"first 5: {eq['DOID:14330'][:5]}")

res = syn.get_normalizer_results("PTGS1")
kg = res["PTGS1"]["knowledge_graph"]
print(f"nodes in cluster: {len(kg['nodes'])}")
print(f"edges: {kg['edges']}")
sample_node = next(iter(kg['nodes'].values()))
print(f"sample node keys: {sorted(sample_node.keys())}")
print(f"sample attribute keys: {sorted(sample_node['attributes'][0].keys())}")

output:

{
  "DOID:14330": {
    "preferred_curie": "MONDO:0005180",
    "preferred_name": "Parkinson disease",
    "preferred_category": "biolink:Disease"
  }
}
{
  "Parkinson's disease": {
    "preferred_curie": "MONDO:0005180",
    "preferred_name": "Parkinson disease",
    "preferred_category": "biolink:Disease"
  }
}
equivalents for DOID:14330: 16 nodes
first 5: ['MONDO:0005180', 'DOID:14330', 'OMIM.PS:168600', 'UMLS:C0030567', 'MESH:D010300']
nodes in cluster: 27
edges: {}
sample node keys: ['attributes', 'categories', 'is_set', 'name']
sample attribute keys: ['attribute_source', 'attribute_type_id', 'attributes', 'description', 'original_attribute_name', 'value', 'value_type_id', 'value_url']

the sample node keys (attributes, categories, is_set, name) and sample attribute keys (all 8 fields incl. the None-defaulted original_attribute_name, value_url, attributes) confirm the output keeps the same shape as ARAX's original openapi_server.models.*.to_dict(). cluster counts (len(equivalents)=16, nodes in cluster=27) can drift when RENCI updates.

development

reproducible setup with pinned deps:

git clone https://github.com/Translator-CATRAX/node-synonymizer
cd node-synonymizer
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'
pytest -v

issues: https://github.com/Translator-CATRAX/node-synonymizer/issues

license

MIT. see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

node_synonymizer-0.1.1.tar.gz (30.0 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

node_synonymizer-0.1.1-py3-none-any.whl (21.6 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file node_synonymizer-0.1.1.tar.gz.

File metadata

Download URL: node_synonymizer-0.1.1.tar.gz
Upload date: Apr 22, 2026
Size: 30.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for node_synonymizer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b129672577df7278570953883fe9fedab25e98a0ac8ecbade9e92cb6c06b7de6`
MD5	`1f79e28ea0354f01bc100310eec7684d`
BLAKE2b-256	`22da6b37126f76ee983eb13fbf43fed4a1c3d57fe0fdb9bbd4c60f45c48436d9`

See more details on using hashes here.

File details

Details for the file node_synonymizer-0.1.1-py3-none-any.whl.

File metadata

Download URL: node_synonymizer-0.1.1-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for node_synonymizer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9d5141c127dbb6f7695c0148d2fa409495ac882745da7fec90fd4984abed517`
MD5	`c61360f966c06c537d773820167a5c1a`
BLAKE2b-256	`46863f9dab682f938ccff4bd734306e1aaa88ac6deda0b72f681d63b9c4508ab`

See more details on using hashes here.

node-synonymizer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

node-synonymizer

what it does

typical use cases

install

pinned versions

quick usage

canonicalize a CURIE

resolve a free-text name

batch both at once

get all equivalent CURIEs in the cluster

look up Biolink category

full normalizer result

public API

endpoints

changes from RTX-ARAX

why not reasoner-pydantic, why not copy ARAX's openapi_server files

tests

smoke run

development

license

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes