Skip to main content

Python SDK for OMOPHub - Medical Vocabulary API with semantic search

Project description

OMOPHub Python SDK

Query millions standardized medical concepts via simple Python API

Access SNOMED CT, ICD-10, RxNorm, LOINC, and 90+ OHDSI ATHENA vocabularies without downloading, installing, or maintaining local databases.

PyPI version Python Versions Codecov License: MIT Downloads

Documentation · API Reference · Examples


Why OMOPHub?

Working with OHDSI ATHENA vocabularies traditionally requires downloading multi-gigabyte files, setting up a database instance, and writing complex SQL queries. OMOPHub eliminates this friction.

Traditional Approach With OMOPHub
Download 5GB+ ATHENA vocabulary files pip install omophub
Set up and maintain database One API call
Write complex SQL with multiple JOINs Simple Python methods
Manually update vocabularies quarterly Always current data
Local infrastructure required Works anywhere Python runs

Installation

pip install omophub

# Optional extras for FHIR client interop
pip install omophub[fhirpy]          # Pre-wired fhirpy client
pip install omophub[fhir-resources]  # Install marker for fhir.resources

Quick Start

from omophub import OMOPHub

# Initialize client (uses OMOPHUB_API_KEY env variable, or pass api_key="...")
client = OMOPHub()

# Get a concept by ID
concept = client.concepts.get(201826)
print(concept["concept_name"])  # "Type 2 diabetes mellitus"

# Search for concepts across vocabularies
results = client.search.basic("metformin", vocabulary_ids=["RxNorm"], domain_ids=["Drug"])
for c in results["concepts"]:
    print(f"{c['concept_id']}: {c['concept_name']}")

# Map ICD-10 code to SNOMED
mappings = client.mappings.get_by_code("ICD10CM", "E11.9", target_vocabulary="SNOMED")

# Navigate concept hierarchy
ancestors = client.hierarchy.ancestors(201826, max_levels=3)

FHIR-to-OMOP Resolution

Resolve FHIR coded values to OMOP standard concepts in one call:

# Single FHIR Coding → OMOP concept + CDM target table
result = client.fhir.resolve(
    system="http://snomed.info/sct",
    code="44054006",
    resource_type="Condition",
)
print(result["resolution"]["target_table"])  # "condition_occurrence"
print(result["resolution"]["mapping_type"])  # "direct"

# ICD-10-CM → traverses "Maps to" automatically
result = client.fhir.resolve(
    system="http://hl7.org/fhir/sid/icd-10-cm",
    code="E11.9",
)
print(result["resolution"]["standard_concept"]["vocabulary_id"])  # "SNOMED"

# Batch resolve up to 100 codings
batch = client.fhir.resolve_batch([
    {"system": "http://snomed.info/sct", "code": "44054006"},
    {"system": "http://loinc.org", "code": "2339-0"},
    {"system": "http://www.nlm.nih.gov/research/umls/rxnorm", "code": "197696"},
])
print(f"Resolved {batch['summary']['resolved']}/{batch['summary']['total']}")

# CodeableConcept with vocabulary preference (SNOMED wins over ICD-10)
result = client.fhir.resolve_codeable_concept(
    coding=[
        {"system": "http://snomed.info/sct", "code": "44054006"},
        {"system": "http://hl7.org/fhir/sid/icd-10-cm", "code": "E11.9"},
    ],
    resource_type="Condition",
)
print(result["best_match"]["resolution"]["source_concept"]["vocabulary_id"])  # "SNOMED"

Type Interoperability

The resolver accepts any Coding-like input via duck typing - a plain dict, omophub's lightweight Coding TypedDict, or any object with .system / .code attributes (e.g. fhir.resources.Coding, fhirpy codings).

from omophub.types.fhir import Coding

# omophub's TypedDict - IDE autocomplete, no extra deps
coding: Coding = {"system": "http://snomed.info/sct", "code": "44054006"}
result = client.fhir.resolve(coding=coding)

# fhir.resources objects work via duck typing - no conversion needed
from fhir.resources.R4B.coding import Coding as FhirCoding
fhir_coding = FhirCoding(system="http://snomed.info/sct", code="44054006")
result = client.fhir.resolve(coding=fhir_coding)

# Mixed shapes in a single batch call
result = client.fhir.resolve_batch([
    {"system": "http://snomed.info/sct", "code": "44054006"},   # dict
    FhirCoding(system="http://loinc.org", code="2339-0"),        # fhir.resources
])

fhir.resources is never a required dependency. See examples/fhir_interop.py for the full set of supported input shapes.

FHIR Client Interop

Point external FHIR client libraries at OMOPHub's FHIR Terminology Service directly - useful when you need raw FHIR Parameters / Bundle responses instead of the Concept Resolver envelope.

from omophub import OMOPHub, get_fhir_server_url

client = OMOPHub(api_key="oh_xxx")

# Property on the client returns the R4 base URL
print(client.fhir_server_url)
# "https://fhir.omophub.com/fhir/r4"

# Helper for other FHIR versions
print(get_fhir_server_url("r5"))
# "https://fhir.omophub.com/fhir/r5"

For fhirpy, install the optional extra and use the pre-wired client:

pip install omophub[fhirpy]
from omophub import get_fhirpy_client

fhir = get_fhirpy_client("oh_xxx")

# Call CodeSystem/$lookup directly via fhirpy
params = fhir.execute(
    "CodeSystem/$lookup",
    method="GET",
    params={"system": "http://snomed.info/sct", "code": "44054006"},
)

When to use which: the Concept Resolver (client.fhir.resolve) gives you OMOP-enriched answers - standard concept ID, CDM target table, mapping quality. Use fhirpy via get_fhirpy_client() when you need raw FHIR responses for FHIR-native tooling.

Semantic Search

Use natural language queries to find concepts using neural embeddings:

# Natural language search - understands clinical intent
results = client.search.semantic("high blood sugar levels")
for r in results["results"]:
    print(f"{r['concept_name']} (similarity: {r['similarity_score']:.2f})")

# Filter by vocabulary and set minimum similarity threshold
results = client.search.semantic(
    "heart attack",
    vocabulary_ids=["SNOMED"],
    domain_ids=["Condition"],
    threshold=0.5
)

# Iterate through all results with auto-pagination
for result in client.search.semantic_iter("chronic kidney disease", page_size=50):
    print(f"{result['concept_id']}: {result['concept_name']}")

Bulk Search

Search for multiple terms in a single API call — much faster than individual requests:

# Bulk lexical search (up to 50 queries)
results = client.search.bulk_basic([
    {"search_id": "q1", "query": "diabetes mellitus"},
    {"search_id": "q2", "query": "hypertension"},
    {"search_id": "q3", "query": "aspirin"},
], defaults={"vocabulary_ids": ["SNOMED"], "page_size": 5})

for item in results["results"]:
    print(f"{item['search_id']}: {len(item['results'])} results")

# Bulk semantic search (up to 25 queries)
results = client.search.bulk_semantic([
    {"search_id": "s1", "query": "heart failure treatment options"},
    {"search_id": "s2", "query": "type 2 diabetes medication"},
], defaults={"threshold": 0.5, "page_size": 10})

Similarity Search

Find concepts similar to a known concept or natural language query:

# Find concepts similar to a known concept
results = client.search.similar(concept_id=201826, algorithm="hybrid")
for r in results["results"]:
    print(f"{r['concept_name']} (score: {r['similarity_score']:.2f})")

# Find similar concepts using a natural language query
results = client.search.similar(
    query="medications for high blood pressure",
    algorithm="semantic",
    similarity_threshold=0.6,
    vocabulary_ids=["RxNorm"],
    include_scores=True,
)

Async Support

import asyncio
from omophub import AsyncOMOPHub

async def main():
    async with AsyncOMOPHub() as client:
        concept = await client.concepts.get(201826)
        print(concept["concept_name"])

asyncio.run(main())

Use Cases

ETL & Data Pipelines

Validate and map clinical codes during OMOP CDM transformations:

# Validate that a source code exists and find its standard equivalent
def validate_and_map(source_vocab, source_code):
    concept = client.concepts.get_by_code(source_vocab, source_code)
    if concept["standard_concept"] != "S":
        mappings = client.mappings.get(concept["concept_id"],
                                        target_vocabulary="SNOMED")
        return mappings["mappings"][0]["target_concept_id"]
    return concept["concept_id"]

Data Quality Checks

Verify codes exist and are valid standard concepts:

# Check if all your condition codes are valid
condition_codes = ["E11.9", "I10", "J44.9"]  # ICD-10 codes
for code in condition_codes:
    try:
        concept = client.concepts.get_by_code("ICD10CM", code)
        print(f"OK {code}: {concept['concept_name']}")
    except omophub.NotFoundError:
        print(f"ERROR {code}: Invalid code!")

Phenotype Development

Explore hierarchies to build comprehensive concept sets:

# Get all descendants of "Type 2 diabetes mellitus" for phenotype
descendants = client.hierarchy.descendants(201826, max_levels=5)
concept_set = [d["concept_id"] for d in descendants["concepts"]]
print(f"Found {len(concept_set)} concepts for T2DM phenotype")

Clinical Applications

Build terminology lookups into healthcare applications:

# Autocomplete for clinical coding interface
suggestions = client.concepts.suggest("diab", vocabulary_ids=["SNOMED"], page_size=10)
# Returns: ["Diabetes mellitus", "Diabetic nephropathy", "Diabetic retinopathy", ...]

API Resources

Resource Description Key Methods
concepts Concept lookup and batch operations get(), get_by_code(), batch(), suggest()
search Full-text and semantic search basic(), advanced(), semantic(), similar(), bulk_basic(), bulk_semantic()
hierarchy Navigate concept relationships ancestors(), descendants()
mappings Cross-vocabulary mappings get(), map()
vocabularies Vocabulary metadata list(), get(), stats()
domains Domain information list(), get(), concepts()
fhir FHIR-to-OMOP resolution resolve(), resolve_batch(), resolve_codeable_concept()

Configuration

client = OMOPHub(
    api_key="oh_xxx",                        # Or set OMOPHUB_API_KEY env var
    base_url="https://api.omophub.com/v1",   # API endpoint
    timeout=30.0,                             # Request timeout (seconds)
    max_retries=3,                            # Retry attempts
    vocab_version="2025.2",                   # Specific vocabulary version
)

Error Handling

import omophub

try:
    concept = client.concepts.get(999999999)
except omophub.NotFoundError as e:
    print(f"Concept not found: {e.message}")
except omophub.AuthenticationError as e:
    print(f"Check your API key: {e.message}")
except omophub.RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after} seconds")
except omophub.APIError as e:
    print(f"API error {e.status_code}: {e.message}")

Type Safety

The SDK is fully typed with TypedDict definitions for IDE autocomplete:

from omophub import OMOPHub, Concept

client = OMOPHub()
concept: Concept = client.concepts.get(201826)

# IDE autocomplete works for all fields
concept["concept_id"]      # int
concept["concept_name"]    # str
concept["vocabulary_id"]   # str
concept["domain_id"]       # str
concept["concept_class_id"] # str

Integration Examples

With Pandas

import pandas as pd

# Search and load into DataFrame
results = client.search.basic("hypertension", page_size=100)
df = pd.DataFrame(results["concepts"])
print(df[["concept_id", "concept_name", "vocabulary_id"]].head())

In Jupyter Notebooks

# Iterate through all results with auto-pagination
for concept in client.search.basic_iter("diabetes", page_size=100):
    process_concept(concept)

Compared to Alternatives

Feature OMOPHub SDK ATHENA Download OHDSI WebAPI
Setup time 1 minute Hours Hours
Infrastructure None Database required Full OHDSI stack
Updates Automatic Manual download Manual
Programmatic access Native Python SQL queries REST API

Best for: Teams who need quick, programmatic access to OMOP vocabularies without infrastructure overhead.

Documentation

Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Clone and install for development
git clone https://github.com/omopHub/omophub-python.git
cd omophub-python
pip install -e ".[dev]"

# Run tests
pytest

Support

License

MIT License - see LICENSE for details.


Built for the OHDSI community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omophub-1.7.0.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omophub-1.7.0-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file omophub-1.7.0.tar.gz.

File metadata

  • Download URL: omophub-1.7.0.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omophub-1.7.0.tar.gz
Algorithm Hash digest
SHA256 ebeed2689f15798d4c2ded7cfeb331fdb2856a1d52b5a22e0114576fc20a0ae2
MD5 f2b3c072c8a4e5e6ea5243fb1ed0bef3
BLAKE2b-256 712f96e03ad5d521c9b0dbf24561bcfdde766d3186055fe8cac0a0e216d3fa1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for omophub-1.7.0.tar.gz:

Publisher: publish.yml on OMOPHub/omophub-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omophub-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: omophub-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 43.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omophub-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7bcbe74d22cd091454e49bac7ac8036430e73ab8494528cc1fe51b90ea0c7a20
MD5 3261555155a1324f366708623dc9ccc8
BLAKE2b-256 ff7dd461b73602f31c456198c28fd36bbcb211f03ea43d8f83f93d1ce7ee1c9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for omophub-1.7.0-py3-none-any.whl:

Publisher: publish.yml on OMOPHub/omophub-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page