Skip to main content

OHDSI OMOP CDM data fetching and cohort management for healthcare AI

Project description

smart-omop

Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.

Installation

pip install smart-omop

Optional dependencies:

pip install smart-omop[viz]         # Plotly visualizations
pip install smart-omop[medsynth]    # MedSynth CSV integration
pip install smart-omop[all]         # All features

Quick Start

from smart_omop import OMOPClient

client = OMOPClient("http://your-server:8080/WebAPI")

# List data sources
sources = client.get_sources()

# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")

# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")

Creating Cohorts

Basic Cohort Builder

from smart_omop import CohortBuilder, Gender, OMOPClient

builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573])  # COPD concept ID
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)

cohort_def = builder.build()

with OMOPClient("http://your-server:8080/WebAPI") as client:
    created = client.create_cohort(cohort_def.to_dict())
    print(f"Created cohort ID: {created['id']}")

Multi-Criteria Cohort Builder

from smart_omop import CohortBuilderFull, Gender, AgeOperator

builder = CohortBuilderFull("COPD with Hypertension", "Multi-condition cohort")

# Add concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)

htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)

# Set primary criterion
builder.add_primary_condition(concept_set_id=0)

# Add inclusion rule for demographics
rule = builder.add_inclusion_rule("Age and Gender")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)

# Build and create
cohort_def = builder.build()
created = client.create_cohort(cohort_def.to_dict())

Supported criteria types: ConditionOccurrence, ProcedureOccurrence, DrugExposure, Measurement, Observation, VisitOccurrence, DeviceExposure, Death.

Heracles Characterization

Examples below use the demo cohort ID 167 and source key KAGGLECOPD. Replace both with your own values from your WebAPI/Atlas instance.

Running Analysis

from smart_omop import HeraclesJobManager, OMOPClient

with OMOPClient("http://your-server:8080/WebAPI") as client:
    manager = HeraclesJobManager(client)

    # Create job
    job = manager.create_job(
        cohort_ids=[167],
        source_key="KAGGLECOPD",
        job_name="COPD_analysis",
        analysis_ids=[1, 2, 3, 4, 5, 400, 401, 402, 403, 404],
        small_cell_count=5
    )

    # Submit (no polling - check status separately)
    result = manager.submit_job(job)
    print(f"Job submitted: {result.get('executionId')}")

Fetching Reports

After Heracles analysis completes, fetch characterization reports:

# Get person demographics
person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)

# Get condition occurrences
condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)

# Get drug exposures
drug_report = client.get_heracles_drug_report(167, "KAGGLECOPD", refresh=True)

# Get procedures
procedure_report = client.get_heracles_procedure_report(167, "KAGGLECOPD", refresh=True)

# Get measurements
measurement_report = client.get_heracles_measurement_report(167, "KAGGLECOPD", refresh=True)

# Get dashboard summary
dashboard = client.get_heracles_dashboard_report(167, "KAGGLECOPD", refresh=True)

Available report types:

  • get_heracles_person_report() - Demographics and year of birth
  • get_heracles_condition_report() - Condition occurrences
  • get_heracles_drug_report() - Drug exposures
  • get_heracles_procedure_report() - Procedures
  • get_heracles_measurement_report() - Measurements
  • get_heracles_observation_report() - Observations
  • get_heracles_death_report() - Death records
  • get_heracles_dashboard_report() - Summary statistics

Example: Person Report Data

person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)

# Gender distribution
for gender in person_report['gender']:
    print(f"{gender['conceptName']}: {gender['countValue']} persons")
# Output:
# MALE: 61 persons
# FEMALE: 34 persons

# Year of birth distribution
birth_years = person_report['yearOfBirth']
print(f"Birth year entries: {len(birth_years)}")
# Output: Birth year entries: 33

# Birth year statistics
stats = person_report['yearOfBirthStats'][0]
print(f"Year range: {stats['minValue']} to {stats['maxValue']}")
# Output: Year range: 1933 to 1977

Example: Condition Report Data

condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)

# Top conditions by prevalence
for condition in condition_report[:5]:
    concept_id = condition['conceptId']
    name = condition['conceptPath'].split('||')[-1]
    num_persons = condition['numPersons']
    percent = condition['percentPersons']

    print(f"{name} ({concept_id}): {num_persons} persons ({percent:.1%})")

# Output:
# Chronic obstructive pulmonary disease (255573): 95 persons (100.0%)
# Moderate chronic obstructive pulmonary disease (4193588): 39 persons (41.1%)
# Severe chronic obstructive pulmonary disease (4209097): 27 persons (28.4%)
# Mild chronic obstructive pulmonary disease (4196712): 21 persons (22.1%)

Visualizations

Visualization examples use the demo cohort ID 167 and source key KAGGLECOPD; swap these for your own values.

Create visualizations from Heracles reports:

from smart_omop import CohortVisualizer
import json

# Load report data
with open('cohort167_KAGGLECOPD_person.json') as f:
    person_data = json.load(f)

with open('cohort167_KAGGLECOPD_condition.json') as f:
    condition_data = json.load(f)

visualizer = CohortVisualizer(output_dir="./visualizations")

# Age distribution from Heracles person report
age_by_gender = visualizer.create_age_distribution(person_data)

# Gender distribution
gender_chart = visualizer.create_gender_distribution(person_data)

# Condition prevalence treemap
condition_treemap = visualizer.create_condition_prevalence(condition_data)

# Dashboard with multiple charts
dashboard = visualizer.create_dashboard_from_reports({
    'person': person_data,
    'condition': condition_data
})

Outputs interactive HTML files using Plotly.

CLI Usage

Creating and Generating Cohorts

# Create cohort
smart-omop --base-url http://your-server:8080/WebAPI create-cohort \
  --name "COPD Patients" \
  --concept-ids 255573 \
  --age-gte 40 \
  --output cohort.json

# Generate cohort
smart-omop --base-url http://your-server:8080/WebAPI generate \
  --cohort-id 167 \
  --source-key KAGGLECOPD

# Check results
smart-omop --base-url http://your-server:8080/WebAPI results \
  --cohort-id 167 \
  --source-key KAGGLECOPD

Running Heracles and Fetching Reports

# Run Heracles analysis
smart-omop --base-url http://your-server:8080/WebAPI heracles \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --job-name "COPD_analysis" \
  --analysis-ids "1,2,3,4,5,400,401,402,403,404"

# Get individual reports (wait ~60 seconds after Heracles starts)
# Person demographics
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type person \
  --output person.json \
  --refresh

# Dashboard
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type dashboard \
  --refresh

# Condition occurrences
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type condition \
  --output condition.json \
  --refresh

# Available types: dashboard, person, condition, drug, procedure,
#                  measurement, observation, death, components_summary

# Or export all reports at once
smart-omop --base-url http://your-server:8080/WebAPI export-reports \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --output-dir ./reports \
  --refresh

Notes:

  • 167 and KAGGLECOPD are example values from the bundled COPD demo. Replace them with your own cohort ID and source_key returned by your WebAPI/Atlas instance.
  • The --refresh flag forces a fresh pull from WebAPI; drop it if you are reusing cached reports.

Analysis IDs

Common Heracles analysis categories:

Category IDs Description
Demographics 1-5, 7-9 Person count, age, gender, race
Conditions 400-413 Condition occurrences and prevalence
Drugs 700-713 Drug exposures and durations
Procedures 600-613 Procedure occurrences
Measurements 1800-1831 Measurement values and distributions
Observations 800-813 Observation records
Visits 200-213 Visit occurrences and types

Example analysis selection:

# Demographics and conditions
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 403, 404]

# Demographics, conditions, and measurements
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 1800, 1801, 1802, 1803]

Or use predefined sets:

from smart_omop import DEMO_ANALYSES, CONDITION_ANALYSES, MEASUREMENT_ANALYSES

analysis_ids = DEMO_ANALYSES + CONDITION_ANALYSES + MEASUREMENT_ANALYSES

MedSynth Integration

MedSynth generates synthetic OMOP CDM data for privacy-preserving research.

Installation:

pip install medsynth

Generate synthetic data:

medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/

Load and analyze:

from smart_omop import MedSynthOMOPSource

source = MedSynthOMOPSource("./omop_data")

# Filter by condition
persons = source.filter_by_condition([255573])  # COPD

# Apply demographics
filtered = source.filter_by_age_gender(
    persons,
    min_age=60,
    gender_concept_ids=[8532]  # Female
)

# Create summary
summary = source.create_cohort_summary(
    concept_ids=[255573],
    min_age=60,
    gender_concept_ids=[8532]
)

print(f"Matching persons: {summary['person_count']}")

For more information: https://github.com/ankurlohachab/medsynth

API Reference

OMOPClient

Core methods:

  • get_sources() - List available data sources
  • get_cohort(cohort_id) - Retrieve cohort definition
  • create_cohort(definition) - Create new cohort
  • generate_cohort(cohort_id, source_key) - Generate cohort
  • get_generation_status(cohort_id, source_key) - Check generation status
  • get_cohort_results(cohort_id, source_key) - Get cohort summary
  • get_heracles_report(cohort_id, source_key, report_type, refresh) - Get specific report
  • get_heracles_person_report(cohort_id, source_key, refresh) - Get demographics
  • get_heracles_condition_report(cohort_id, source_key, refresh) - Get conditions
  • get_heracles_drug_report(cohort_id, source_key, refresh) - Get drug exposures
  • get_heracles_procedure_report(cohort_id, source_key, refresh) - Get procedures
  • get_heracles_measurement_report(cohort_id, source_key, refresh) - Get measurements
  • get_heracles_dashboard_report(cohort_id, source_key, refresh) - Get dashboard

HeraclesJobManager

Methods:

  • create_job(cohort_ids, source_key, job_name, analysis_ids, small_cell_count) - Create job
  • submit_job(job_config, poll, timeout) - Submit job
  • get_job_status(execution_id) - Check job status

CohortBuilder

Methods:

  • with_condition(name, concept_ids) - Add condition criterion
  • with_age_range(min_age, max_age) - Set age requirements
  • with_gender(gender) - Set gender requirement
  • build() - Generate cohort definition

CohortVisualizer

Methods:

  • create_age_distribution(person_report) - Age distribution from Heracles
  • create_gender_distribution(person_report) - Gender breakdown
  • create_condition_prevalence(condition_report) - Condition treemap
  • create_dashboard_from_reports(reports) - Multi-panel dashboard

Configuration

Custom timeout and retries:

client = OMOPClient(
    "http://your-server:8080/WebAPI",
    timeout=60,
    max_retries=5,
    verify_ssl=True
)

Environment variable:

export OMOP_BASE_URL="http://your-server:8080/WebAPI"

Testing

Run tests:

pytest tests/ -v

Example test output:

tests/test_client.py::test_client_initialization PASSED
tests/test_cohort.py::test_simple_cohort_builder PASSED
tests/test_heracles.py::test_heracles_job_config PASSED
tests/test_error_handling.py::test_invalid_cohort_id PASSED

29 passed in 6.91s

Tested against OHDSI WebAPI 2.14.0.

Requirements

  • Python 3.9+
  • OHDSI WebAPI instance (v2.7+) or MedSynth CSV data

Examples

See examples/ directory:

  • example_quickstart.py - Basic operations
  • example_simple_cohort.py - Cohort building
  • example_heracles.py - Characterization
  • example_heracles_reports.py - Report fetching
  • example_visualizations.py - Visualizations
  • example_medsynth.py - Synthetic data

Development

git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop

pip install -e ".[dev]"

pytest
mypy src/smart_omop
black src/smart_omop

Author

Ankur Lohachab, Department of Advanced Computing Sciences, Maastricht University

License

MIT License - see LICENSE file.

Citation

@software{lohachab2025smartomop,
  author = {Lohachab, Ankur},
  title = {smart-omop: OHDSI OMOP CDM Client for Python},
  year = {2025},
  url = {https://github.com/ankurlohachab/smart-omop}
}

Support

Issues: https://github.com/ankurlohachab/smart-omop/issues Email: ankur.lohachab@maastrichtuniversity.nl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_omop-2.0.3.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_omop-2.0.3-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file smart_omop-2.0.3.tar.gz.

File metadata

  • Download URL: smart_omop-2.0.3.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for smart_omop-2.0.3.tar.gz
Algorithm Hash digest
SHA256 2365f7794480947dfcd069bffd2331b0f4424038ef3ed7f3adfd3ec918e068b2
MD5 8810e6d85ecb127d3280ff3529365136
BLAKE2b-256 c6d37ec16b6b4edda8708b29ed98b2e8e7ec72fbc35eb0360f1f62dbcb679a18

See more details on using hashes here.

File details

Details for the file smart_omop-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: smart_omop-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for smart_omop-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b7f5441321d4a1b35029526e91733367dbd9541ab73e16149b7384ee50c53bb9
MD5 21089b78abce83faa4ffef83c90ea11c
BLAKE2b-256 0f361e6c2cd6e65a313fa6b6ae612c37daa412003499569eef5831105216b400

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page