Skip to main content

OHDSI OMOP CDM data fetching and cohort management for healthcare AI

Project description

smart-omop

Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.

Installation

pip install smart-omop

Optional dependencies:

pip install smart-omop[viz]         # Plotly visualizations
pip install smart-omop[medsynth]    # MedSynth CSV integration
pip install smart-omop[all]         # All features

Quick Start

from smart_omop import OMOPClient

client = OMOPClient("http://your-server:8080/WebAPI")

# List data sources
sources = client.get_sources()

# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")

# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")

Creating Cohorts

Basic Cohort Builder

from smart_omop import CohortBuilder, Gender, OMOPClient

builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573])  # COPD concept ID
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)

cohort_def = builder.build()

with OMOPClient("http://your-server:8080/WebAPI") as client:
    created = client.create_cohort(cohort_def.to_dict())
    print(f"Created cohort ID: {created['id']}")

Multi-Criteria Cohort Builder

from smart_omop import CohortBuilderFull, Gender, AgeOperator

builder = CohortBuilderFull("COPD with Hypertension", "Multi-condition cohort")

# Add concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)

htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)

# Set primary criterion
builder.add_primary_condition(concept_set_id=0)

# Add inclusion rule for demographics
rule = builder.add_inclusion_rule("Age and Gender")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)

# Build and create
cohort_def = builder.build()
created = client.create_cohort(cohort_def.to_dict())

Supported criteria types: ConditionOccurrence, ProcedureOccurrence, DrugExposure, Measurement, Observation, VisitOccurrence, DeviceExposure, Death.

Heracles Characterization

Running Analysis

from smart_omop import HeraclesJobManager, OMOPClient

with OMOPClient("http://your-server:8080/WebAPI") as client:
    manager = HeraclesJobManager(client)

    # Create job
    job = manager.create_job(
        cohort_ids=[167],
        source_key="KAGGLECOPD",
        job_name="COPD_analysis",
        analysis_ids=[1, 2, 3, 4, 5, 400, 401, 402, 403, 404],
        small_cell_count=5
    )

    # Submit (no polling - check status separately)
    result = manager.submit_job(job)
    print(f"Job submitted: {result.get('executionId')}")

Fetching Reports

After Heracles analysis completes, fetch characterization reports:

# Get person demographics
person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)

# Get condition occurrences
condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)

# Get drug exposures
drug_report = client.get_heracles_drug_report(167, "KAGGLECOPD", refresh=True)

# Get procedures
procedure_report = client.get_heracles_procedure_report(167, "KAGGLECOPD", refresh=True)

# Get measurements
measurement_report = client.get_heracles_measurement_report(167, "KAGGLECOPD", refresh=True)

# Get dashboard summary
dashboard = client.get_heracles_dashboard_report(167, "KAGGLECOPD", refresh=True)

Available report types:

  • get_heracles_person_report() - Demographics and year of birth
  • get_heracles_condition_report() - Condition occurrences
  • get_heracles_drug_report() - Drug exposures
  • get_heracles_procedure_report() - Procedures
  • get_heracles_measurement_report() - Measurements
  • get_heracles_observation_report() - Observations
  • get_heracles_death_report() - Death records
  • get_heracles_dashboard_report() - Summary statistics

Example: Person Report Data

person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)

# Gender distribution
for gender in person_report['gender']:
    print(f"{gender['conceptName']}: {gender['countValue']} persons")
# Output:
# MALE: 61 persons
# FEMALE: 34 persons

# Year of birth distribution
birth_years = person_report['yearOfBirth']
print(f"Birth year entries: {len(birth_years)}")
# Output: Birth year entries: 33

# Birth year statistics
stats = person_report['yearOfBirthStats'][0]
print(f"Year range: {stats['minValue']} to {stats['maxValue']}")
# Output: Year range: 1933 to 1977

Example: Condition Report Data

condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)

# Top conditions by prevalence
for condition in condition_report[:5]:
    concept_id = condition['conceptId']
    name = condition['conceptPath'].split('||')[-1]
    num_persons = condition['numPersons']
    percent = condition['percentPersons']

    print(f"{name} ({concept_id}): {num_persons} persons ({percent:.1%})")

# Output:
# Chronic obstructive pulmonary disease (255573): 95 persons (100.0%)
# Moderate chronic obstructive pulmonary disease (4193588): 39 persons (41.1%)
# Severe chronic obstructive pulmonary disease (4209097): 27 persons (28.4%)
# Mild chronic obstructive pulmonary disease (4196712): 21 persons (22.1%)

Visualizations

Create visualizations from Heracles reports:

from smart_omop import CohortVisualizer
import json

# Load report data
with open('cohort167_KAGGLECOPD_person.json') as f:
    person_data = json.load(f)

with open('cohort167_KAGGLECOPD_condition.json') as f:
    condition_data = json.load(f)

visualizer = CohortVisualizer(output_dir="./visualizations")

# Age distribution from Heracles person report
age_by_gender = visualizer.create_age_distribution(person_data)

# Gender distribution
gender_chart = visualizer.create_gender_distribution(person_data)

# Condition prevalence treemap
condition_treemap = visualizer.create_condition_prevalence(condition_data)

# Dashboard with multiple charts
dashboard = visualizer.create_dashboard_from_reports({
    'person': person_data,
    'condition': condition_data
})

Outputs interactive HTML files using Plotly.

CLI Usage

Creating and Generating Cohorts

# Create cohort
smart-omop --base-url http://your-server:8080/WebAPI create-cohort \
  --name "COPD Patients" \
  --concept-ids 255573 \
  --age-gte 40 \
  --output cohort.json

# Generate cohort
smart-omop --base-url http://your-server:8080/WebAPI generate \
  --cohort-id 167 \
  --source-key KAGGLECOPD

# Check results
smart-omop --base-url http://your-server:8080/WebAPI results \
  --cohort-id 167 \
  --source-key KAGGLECOPD

Running Heracles and Fetching Reports

# Run Heracles analysis
smart-omop --base-url http://your-server:8080/WebAPI heracles \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --job-name "COPD_analysis" \
  --analysis-ids "1,2,3,4,5,400,401,402,403,404"

# Get individual reports (wait ~60 seconds after Heracles starts)
# Person demographics
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type person \
  --output person.json \
  --refresh

# Dashboard
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type dashboard \
  --refresh

# Condition occurrences
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type condition \
  --output condition.json \
  --refresh

# Available types: dashboard, person, condition, drug, procedure,
#                  measurement, observation, death, components_summary

# Or export all reports at once
smart-omop --base-url http://your-server:8080/WebAPI export-reports \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --output-dir ./reports \
  --refresh

Analysis IDs

Common Heracles analysis categories:

Category IDs Description
Demographics 1-5, 7-9 Person count, age, gender, race
Conditions 400-413 Condition occurrences and prevalence
Drugs 700-713 Drug exposures and durations
Procedures 600-613 Procedure occurrences
Measurements 1800-1831 Measurement values and distributions
Observations 800-813 Observation records
Visits 200-213 Visit occurrences and types

Example analysis selection:

# Demographics and conditions
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 403, 404]

# Demographics, conditions, and measurements
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 1800, 1801, 1802, 1803]

Or use predefined sets:

from smart_omop import DEMO_ANALYSES, CONDITION_ANALYSES, MEASUREMENT_ANALYSES

analysis_ids = DEMO_ANALYSES + CONDITION_ANALYSES + MEASUREMENT_ANALYSES

MedSynth Integration

MedSynth generates synthetic OMOP CDM data for privacy-preserving research.

Installation:

pip install medsynth

Generate synthetic data:

medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/

Load and analyze:

from smart_omop import MedSynthOMOPSource

source = MedSynthOMOPSource("./omop_data")

# Filter by condition
persons = source.filter_by_condition([255573])  # COPD

# Apply demographics
filtered = source.filter_by_age_gender(
    persons,
    min_age=60,
    gender_concept_ids=[8532]  # Female
)

# Create summary
summary = source.create_cohort_summary(
    concept_ids=[255573],
    min_age=60,
    gender_concept_ids=[8532]
)

print(f"Matching persons: {summary['person_count']}")

For more information: https://github.com/ankurlohachab/medsynth

API Reference

OMOPClient

Core methods:

  • get_sources() - List available data sources
  • get_cohort(cohort_id) - Retrieve cohort definition
  • create_cohort(definition) - Create new cohort
  • generate_cohort(cohort_id, source_key) - Generate cohort
  • get_generation_status(cohort_id, source_key) - Check generation status
  • get_cohort_results(cohort_id, source_key) - Get cohort summary
  • get_heracles_report(cohort_id, source_key, report_type, refresh) - Get specific report
  • get_heracles_person_report(cohort_id, source_key, refresh) - Get demographics
  • get_heracles_condition_report(cohort_id, source_key, refresh) - Get conditions
  • get_heracles_drug_report(cohort_id, source_key, refresh) - Get drug exposures
  • get_heracles_procedure_report(cohort_id, source_key, refresh) - Get procedures
  • get_heracles_measurement_report(cohort_id, source_key, refresh) - Get measurements
  • get_heracles_dashboard_report(cohort_id, source_key, refresh) - Get dashboard

HeraclesJobManager

Methods:

  • create_job(cohort_ids, source_key, job_name, analysis_ids, small_cell_count) - Create job
  • submit_job(job_config, poll, timeout) - Submit job
  • get_job_status(execution_id) - Check job status

CohortBuilder

Methods:

  • with_condition(name, concept_ids) - Add condition criterion
  • with_age_range(min_age, max_age) - Set age requirements
  • with_gender(gender) - Set gender requirement
  • build() - Generate cohort definition

CohortVisualizer

Methods:

  • create_age_distribution(person_report) - Age distribution from Heracles
  • create_gender_distribution(person_report) - Gender breakdown
  • create_condition_prevalence(condition_report) - Condition treemap
  • create_dashboard_from_reports(reports) - Multi-panel dashboard

Configuration

Custom timeout and retries:

client = OMOPClient(
    "http://your-server:8080/WebAPI",
    timeout=60,
    max_retries=5,
    verify_ssl=True
)

Environment variable:

export OMOP_BASE_URL="http://your-server:8080/WebAPI"

Testing

Run tests:

pytest tests/ -v

Example test output:

tests/test_client.py::test_client_initialization PASSED
tests/test_cohort.py::test_simple_cohort_builder PASSED
tests/test_heracles.py::test_heracles_job_config PASSED
tests/test_error_handling.py::test_invalid_cohort_id PASSED

29 passed in 6.91s

Tested against OHDSI WebAPI 2.14.0.

Requirements

  • Python 3.9+
  • OHDSI WebAPI instance (v2.7+) or MedSynth CSV data

Examples

See examples/ directory:

  • example_quickstart.py - Basic operations
  • example_simple_cohort.py - Cohort building
  • example_heracles.py - Characterization
  • example_heracles_reports.py - Report fetching
  • example_visualizations.py - Visualizations
  • example_medsynth.py - Synthetic data

Development

git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop

pip install -e ".[dev]"

pytest
mypy src/smart_omop
black src/smart_omop

Author

Ankur Lohachab Department of Advanced Computing Sciences Maastricht University

License

MIT License - see LICENSE file.

Citation

@software{lohachab2025smartomop,
  author = {Lohachab, Ankur},
  title = {smart-omop: OHDSI OMOP CDM Client for Python},
  year = {2025},
  url = {https://github.com/ankurlohachab/smart-omop}
}

Support

Issues: https://github.com/ankurlohachab/smart-omop/issues Email: ankur.lohachab@maastrichtuniversity.nl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_omop-2.0.1.tar.gz (39.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_omop-2.0.1-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file smart_omop-2.0.1.tar.gz.

File metadata

  • Download URL: smart_omop-2.0.1.tar.gz
  • Upload date:
  • Size: 39.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for smart_omop-2.0.1.tar.gz
Algorithm Hash digest
SHA256 a2adfd3ced3b4a1428b4b59c5449979fbf1c4d5eb33c54b056d41200750745cc
MD5 0be322be2aa93237441a5c181b9658f6
BLAKE2b-256 dcb4436f4018369141a63b1d737fdf80eab68c39ff22e7313fd392f67bc5cf92

See more details on using hashes here.

File details

Details for the file smart_omop-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: smart_omop-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for smart_omop-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eadc7434444486ce20a7ddeb1d482aad05b87f508619447d596e98ab4efe7127
MD5 ddfeb23a5f939ccf1c988dc326d4dcfa
BLAKE2b-256 09d8b485df6aed1735581724b2119bcf483dcb359e6be85bec01686a1fb600f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page