Skip to main content

OHDSI OMOP CDM data fetching and cohort management for healthcare AI

Project description

smart-omop

Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.

Features

  • Cohort definition creation with CIRCE expression syntax
  • Cohort generation and results retrieval
  • Heracles characterization with configurable analysis sets
  • Concept set management and resolution
  • MedSynth synthetic data integration (CSV-based)
  • Interactive visualizations (Plotly/matplotlib)
  • CLI and Python API

Installation

pip install smart-omop

Optional dependencies:

pip install smart-omop[viz]         # Plotly visualizations
pip install smart-omop[medsynth]    # MedSynth integration
pip install smart-omop[all]         # All features

Examples

The examples/ directory contains standalone example scripts demonstrating key features:

  • example_quickstart.py - Basic client operations
  • example_simple_cohort.py - Simple cohort building
  • example_circe_syntax.py - Full CIRCE syntax
  • example_heracles.py - Heracles characterization
  • example_medsynth.py - MedSynth CSV data integration
  • example_visualizations.py - Interactive visualizations

See examples/README.md for details on running examples.

Quick Start

from smart_omop import OMOPClient

client = OMOPClient("http://your-webapi:8080/WebAPI")

# List data sources
sources = client.get_sources()

# Fetch cohort definition
cohort = client.get_cohort(cohort_id=1)

# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")

# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")

Cohort Building

Example 1

from smart_omop import CohortBuilder, Gender

builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573, 40481087])
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)

cohort_def = builder.build()

with OMOPClient("http://your-webapi:8080/WebAPI") as client:
    created = client.create_cohort(cohort_def.to_dict())

Example 2

from smart_omop import CohortBuilderFull, Gender, AgeOperator

builder = CohortBuilderFull("Complex Cohort", "Multiple criteria")

# Concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)

htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)

# Primary criterion
builder.add_primary_condition(concept_set_id=0)

# Observation window
builder.set_observation_window(prior_days=365, post_days=0)

# Inclusion rules
rule = builder.add_inclusion_rule("Demographics")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)

cohort_def = builder.build()

Supported primary criteria types: ConditionOccurrence, ProcedureOccurrence, DrugExposure, Measurement, Observation, VisitOccurrence, DeviceExposure, Death.

Heracles Characterization

from smart_omop import HeraclesJobManager, HeraclesAnalysisBuilder

with OMOPClient("http://your-webapi:8080/WebAPI") as client:
    mgr = HeraclesJobManager(client)

    # Build analysis set
    analyses = HeraclesAnalysisBuilder()
    analyses.add_demographics()
    analyses.add_conditions()
    analyses.add_drugs()

    # Create job
    job = mgr.create_job(
        cohort_ids=[1],
        source_key="MY_CDM",
        job_name="COPD_Characterization",
        analysis_ids=analyses.build(),
        small_cell_count=5
    )

    # Submit
    result = mgr.submit_job(job, poll=True, timeout=1800)

Job configuration format:

{
  "jobName": "COPD_Characterization",
  "sourceKey": "MY_CDM",
  "smallCellCount": 5,
  "cohortDefinitionIds": [1],
  "analysisIds": [1, 2, 3, 400, 401, ...],
  "runHeraclesHeel": false,
  "cohortPeriodOnly": false
}

Analysis categories: DEMO_ANALYSES, CONDITION_ANALYSES, DRUG_ANALYSES, PROCEDURE_ANALYSES, MEASUREMENT_ANALYSES, VISIT_ANALYSES, OBSERVATION_ANALYSES.

Data Sources

WebAPI Instance

from smart_omop import OMOPClient, fetch_cohort_data

# Any OHDSI WebAPI instance
data = fetch_cohort_data(
    "http://your-webapi:8080/WebAPI",
    cohort_id=1,
    source_key="MY_CDM",
    include_results=True
)

MedSynth CSV Data

MedSynth is a medical synthetic data generator that creates privacy-preserving OMOP CDM datasets. It generates CT scans and OMOP-formatted CSV files using statistical methods.

Installation:

pip install medsynth

Generate synthetic OMOP data:

medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/

Load and filter MedSynth-generated data:

from smart_omop import MedSynthOMOPSource, Gender

source = MedSynthOMOPSource("/path/to/medsynth/output")

# Filter by condition
persons = source.filter_by_condition([255573])

# Apply demographics
filtered = source.filter_by_age_gender(
    persons,
    min_age=60,
    gender_concept_ids=[Gender.FEMALE.value]
)

# Create summary
summary = source.create_cohort_summary(
    concept_ids=[255573],
    min_age=60,
    gender_concept_ids=[Gender.FEMALE.value]
)

For more information: https://github.com/ankurlohachab/medsynth

Supported OMOP tables: person, condition_occurrence, drug_exposure, procedure_occurrence, measurement, observation, visit_occurrence, death.

Visualizations

from smart_omop import CohortVisualizer

visualizer = CohortVisualizer(output_dir="./viz")

# Age distribution
age_data = {
    'male': [45, 52, 61, 67, 72, ...],
    'female': [48, 55, 59, 64, 70, ...]
}
age_path = visualizer.create_age_pyramid(age_data)

# Condition prevalence
condition_counts = {'255573': 100, '316866': 45}
condition_names = {'255573': 'COPD', '316866': 'Hypertension'}
treemap_path = visualizer.create_condition_treemap(condition_counts, condition_names)

# Dashboard
dashboard_path = visualizer.create_dashboard(cohort_data)

Outputs interactive HTML files using Plotly. Falls back to matplotlib if Plotly unavailable.

CLI

# Create cohort
smart-omop --base-url http://your-webapi:8080/WebAPI create-cohort \
  --name "COPD Cohort" \
  --concept-ids 255573,40481087 \
  --age-gte 40 \
  --gender female

# Generate cohort
smart-omop --base-url http://your-webapi:8080/WebAPI generate \
  --cohort-id 1 \
  --source-key MY_CDM

# Fetch results
smart-omop --base-url http://your-webapi:8080/WebAPI results \
  --cohort-id 1 \
  --source-key MY_CDM \
  --output results.json

Configuration

Environment variables:

export OMOP_BASE_URL="http://your-webapi:8080/WebAPI"

Custom timeout and retries:

client = OMOPClient(
    "http://your-webapi:8080/WebAPI",
    timeout=60,
    max_retries=5,
    verify_ssl=True
)

API Reference

OMOPClient

Core client for WebAPI interactions.

Methods:

  • get_sources() - List available data sources
  • get_cohort(cohort_id) - Fetch cohort definition
  • create_cohort(definition) - Create new cohort
  • generate_cohort(cohort_id, source_key) - Generate cohort on source
  • get_generation_status(cohort_id, source_key) - Check generation status
  • get_cohort_results(cohort_id, source_key) - Fetch cohort summary
  • get_heracles_analyses(cohort_id, source_key) - Fetch Heracles analyses
  • run_heracles(cohort_id, source_key) - Run characterization
  • get_concept_set(concept_set_id) - Fetch concept set
  • resolve_concept_set(expression, source_key) - Resolve to concept IDs

CohortBuilder

Fluent interface for cohort definitions.

Methods:

  • with_condition(name, concept_ids) - Add condition criterion
  • with_age_range(min_age, max_age) - Set age requirements
  • with_gender(gender) - Set gender requirement
  • with_observation_window(prior_days, post_days) - Set observation window
  • build() - Generate cohort definition

Expression syntax support.

Methods:

  • add_concept_set(name) - Create concept set
  • add_primary_condition(concept_set_id) - Add condition criterion
  • add_primary_procedure(concept_set_id) - Add procedure criterion
  • add_primary_drug(concept_set_id) - Add drug criterion
  • add_primary_measurement(concept_set_id) - Add measurement criterion
  • set_observation_window(prior_days, post_days) - Set observation window
  • set_primary_criteria_limit(limit_type) - Set limit type (All, First)
  • add_inclusion_rule(name, description) - Add inclusion rule
  • build() - Generate cohort definition

HeraclesJobManager

Heracles job management.

Methods:

  • create_job(cohort_ids, source_key, ...) - Create job configuration
  • submit_job(job_config, poll, timeout) - Submit and optionally poll
  • get_job_status(execution_id) - Get job status

MedSynthOMOPSource

CSV-based OMOP data source.

Methods:

  • load_table(table_name) - Load OMOP table from CSV
  • get_person_count() - Get total persons
  • get_condition_counts() - Get condition counts by concept ID
  • filter_by_condition(concept_ids) - Filter persons by condition
  • filter_by_age_gender(person_ids, min_age, max_age, gender_concept_ids) - Apply demographics
  • create_cohort_summary(concept_ids, min_age, max_age, gender_concept_ids) - Create summary

CohortVisualizer

Visualization generator.

Methods:

  • create_age_pyramid(age_data, save_path) - Age distribution by gender
  • create_condition_treemap(condition_counts, condition_names, save_path) - Condition prevalence
  • create_temporal_pattern(dates, save_path) - Cohort entry over time
  • create_dashboard(cohort_data, save_path) - Comprehensive dashboard

High-Level Functions

  • fetch_cohort_data(base_url, cohort_id, source_key) - Complete cohort data
  • fetch_concept_sets(base_url, concept_set_ids) - Multiple concept sets
  • create_and_generate_cohort(base_url, cohort_name, concept_ids, source_key, age_min) - Create and generate
  • poll_generation_status(base_url, cohort_id, source_key, max_wait) - Poll until complete
  • create_simple_cohort(name, description, concept_ids, include_descendants, age_min, age_max, genders) - Simple cohort
  • create_standard_job(cohort_ids, source_key, job_name, ...) - Standard Heracles job
  • load_from_medsynth_directory(data_directory, concept_ids, min_age, max_age, gender_concept_ids) - Load from CSV
  • create_cohort_visualizations(cohort_data, output_dir) - All visualizations

Testing

Run test suite:

pytest tests/ -v

Test results:

tests/test_client.py::test_client_initialization ......... PASSED
tests/test_client.py::test_context_manager ............... PASSED
tests/test_client.py::test_get_sources ................... PASSED
tests/test_client.py::test_get_cohort .................... PASSED
tests/test_client.py::test_get_generation_status ......... PASSED
tests/test_client.py::test_get_cohort_results ............ PASSED
tests/test_cohort.py::test_simple_cohort_builder ......... PASSED
tests/test_cohort.py::test_full_cohort_builder ........... PASSED
tests/test_cohort.py::test_create_simple_cohort .......... PASSED
tests/test_cohort.py::test_multiple_concept_sets ......... PASSED
tests/test_cohort.py::test_age_operators ................. PASSED
tests/test_error_handling.py::test_invalid_cohort_id ..... PASSED
tests/test_error_handling.py::test_nonexistent_cohort .... PASSED
tests/test_error_handling.py::test_invalid_source_key .... PASSED
tests/test_error_handling.py::test_nonexistent_source .... PASSED
tests/test_error_handling.py::test_cohort_not_generated .. PASSED
tests/test_error_handling.py::test_invalid_cohort_definition PASSED
tests/test_error_handling.py::test_empty_concept_set ..... PASSED
tests/test_error_handling.py::test_no_primary_criteria ... PASSED
tests/test_error_handling.py::test_invalid_concept_id .... PASSED
tests/test_error_handling.py::test_medsynth_invalid_directory PASSED
tests/test_error_handling.py::test_medsynth_invalid_table . PASSED
tests/test_error_handling.py::test_medsynth_missing_table_file PASSED
tests/test_heracles.py::test_heracles_job_config ......... PASSED
tests/test_heracles.py::test_analysis_builder ............ PASSED
tests/test_heracles.py::test_analysis_categories ......... PASSED
tests/test_heracles.py::test_create_standard_job ......... PASSED
tests/test_heracles.py::test_custom_analyses ............. PASSED
tests/test_heracles.py::test_comprehensive_analyses ...... PASSED

29 passed in 6.91s

Tested against OHDSI WebAPI 2.14.0 with KAGGLECOPD and SYNPUF1K data sources.

Requirements

  • Python 3.9+
  • OHDSI WebAPI instance (v2.7+) or MedSynth CSV data
  • Network access to WebAPI endpoint (if using WebAPI)

Development

git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop

pip install -e ".[dev]"

pytest
mypy src/smart_omop
black src/smart_omop

Author

Ankur Lohachab Department of Advanced Computing Sciences Maastricht University

License

MIT License - see LICENSE file.

Citation

@software{lohachab2025smartomop,
  author = {Lohachab, Ankur},
  title = {smart-omop: OHDSI OMOP CDM Data Fetching for Healthcare AI},
  year = {2025},
  url = {https://github.com/ankurlohachab/smart-omop}
}

Support

Issues: https://github.com/ankurlohachab/smart-omop/issues Email: ankur.lohachab@maastrichtuniversity.nl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_omop-2.0.0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_omop-2.0.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file smart_omop-2.0.0.tar.gz.

File metadata

  • Download URL: smart_omop-2.0.0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for smart_omop-2.0.0.tar.gz
Algorithm Hash digest
SHA256 aea67fc777ef9bf9f2c0f671f324689b38471e1ee4cf80facf17fdf5546d986d
MD5 c44cb30de602eb814b3cdf2baafa9b10
BLAKE2b-256 005f071dbf9eb5f99e51afc077174219692794325ae54243821ec243783aa157

See more details on using hashes here.

File details

Details for the file smart_omop-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: smart_omop-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for smart_omop-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47c65cecba8373b8e40e941dbfaf976f2d6a7117cd16b7e66bc1aa18c9212caa
MD5 109b9b8d0d33ec0b82574bd16c9f77c5
BLAKE2b-256 638dd3b13187a4e3034be4308d05bb9d3577ea3a5830686e1d5eb51ec9884c5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page