OHDSI OMOP CDM data fetching and cohort management for healthcare AI
Project description
smart-omop
Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.
Features
- Cohort definition creation with CIRCE expression syntax
- Cohort generation and results retrieval
- Heracles characterization with configurable analysis sets
- Concept set management and resolution
- MedSynth synthetic data integration (CSV-based)
- Interactive visualizations (Plotly/matplotlib)
- CLI and Python API
Installation
pip install smart-omop
Optional dependencies:
pip install smart-omop[viz] # Plotly visualizations
pip install smart-omop[medsynth] # MedSynth integration
pip install smart-omop[all] # All features
Examples
The examples/ directory contains standalone example scripts demonstrating key features:
example_quickstart.py- Basic client operationsexample_simple_cohort.py- Simple cohort buildingexample_circe_syntax.py- Full CIRCE syntaxexample_heracles.py- Heracles characterizationexample_medsynth.py- MedSynth CSV data integrationexample_visualizations.py- Interactive visualizations
See examples/README.md for details on running examples.
Quick Start
from smart_omop import OMOPClient
client = OMOPClient("http://your-webapi:8080/WebAPI")
# List data sources
sources = client.get_sources()
# Fetch cohort definition
cohort = client.get_cohort(cohort_id=1)
# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")
# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")
Cohort Building
Example 1
from smart_omop import CohortBuilder, Gender
builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573, 40481087])
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)
cohort_def = builder.build()
with OMOPClient("http://your-webapi:8080/WebAPI") as client:
created = client.create_cohort(cohort_def.to_dict())
Example 2
from smart_omop import CohortBuilderFull, Gender, AgeOperator
builder = CohortBuilderFull("Complex Cohort", "Multiple criteria")
# Concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)
htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)
# Primary criterion
builder.add_primary_condition(concept_set_id=0)
# Observation window
builder.set_observation_window(prior_days=365, post_days=0)
# Inclusion rules
rule = builder.add_inclusion_rule("Demographics")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)
cohort_def = builder.build()
Supported primary criteria types: ConditionOccurrence, ProcedureOccurrence, DrugExposure, Measurement, Observation, VisitOccurrence, DeviceExposure, Death.
Heracles Characterization
from smart_omop import HeraclesJobManager, HeraclesAnalysisBuilder
with OMOPClient("http://your-webapi:8080/WebAPI") as client:
mgr = HeraclesJobManager(client)
# Build analysis set
analyses = HeraclesAnalysisBuilder()
analyses.add_demographics()
analyses.add_conditions()
analyses.add_drugs()
# Create job
job = mgr.create_job(
cohort_ids=[1],
source_key="MY_CDM",
job_name="COPD_Characterization",
analysis_ids=analyses.build(),
small_cell_count=5
)
# Submit
result = mgr.submit_job(job, poll=True, timeout=1800)
Job configuration format:
{
"jobName": "COPD_Characterization",
"sourceKey": "MY_CDM",
"smallCellCount": 5,
"cohortDefinitionIds": [1],
"analysisIds": [1, 2, 3, 400, 401, ...],
"runHeraclesHeel": false,
"cohortPeriodOnly": false
}
Analysis categories: DEMO_ANALYSES, CONDITION_ANALYSES, DRUG_ANALYSES, PROCEDURE_ANALYSES, MEASUREMENT_ANALYSES, VISIT_ANALYSES, OBSERVATION_ANALYSES.
Data Sources
WebAPI Instance
from smart_omop import OMOPClient, fetch_cohort_data
# Any OHDSI WebAPI instance
data = fetch_cohort_data(
"http://your-webapi:8080/WebAPI",
cohort_id=1,
source_key="MY_CDM",
include_results=True
)
MedSynth CSV Data
MedSynth is a medical synthetic data generator that creates privacy-preserving OMOP CDM datasets. It generates CT scans and OMOP-formatted CSV files using statistical methods.
Installation:
pip install medsynth
Generate synthetic OMOP data:
medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/
Load and filter MedSynth-generated data:
from smart_omop import MedSynthOMOPSource, Gender
source = MedSynthOMOPSource("/path/to/medsynth/output")
# Filter by condition
persons = source.filter_by_condition([255573])
# Apply demographics
filtered = source.filter_by_age_gender(
persons,
min_age=60,
gender_concept_ids=[Gender.FEMALE.value]
)
# Create summary
summary = source.create_cohort_summary(
concept_ids=[255573],
min_age=60,
gender_concept_ids=[Gender.FEMALE.value]
)
For more information: https://github.com/ankurlohachab/medsynth
Supported OMOP tables: person, condition_occurrence, drug_exposure, procedure_occurrence, measurement, observation, visit_occurrence, death.
Visualizations
from smart_omop import CohortVisualizer
visualizer = CohortVisualizer(output_dir="./viz")
# Age distribution
age_data = {
'male': [45, 52, 61, 67, 72, ...],
'female': [48, 55, 59, 64, 70, ...]
}
age_path = visualizer.create_age_pyramid(age_data)
# Condition prevalence
condition_counts = {'255573': 100, '316866': 45}
condition_names = {'255573': 'COPD', '316866': 'Hypertension'}
treemap_path = visualizer.create_condition_treemap(condition_counts, condition_names)
# Dashboard
dashboard_path = visualizer.create_dashboard(cohort_data)
Outputs interactive HTML files using Plotly. Falls back to matplotlib if Plotly unavailable.
CLI
# Create cohort
smart-omop --base-url http://your-webapi:8080/WebAPI create-cohort \
--name "COPD Cohort" \
--concept-ids 255573,40481087 \
--age-gte 40 \
--gender female
# Generate cohort
smart-omop --base-url http://your-webapi:8080/WebAPI generate \
--cohort-id 1 \
--source-key MY_CDM
# Fetch results
smart-omop --base-url http://your-webapi:8080/WebAPI results \
--cohort-id 1 \
--source-key MY_CDM \
--output results.json
Configuration
Environment variables:
export OMOP_BASE_URL="http://your-webapi:8080/WebAPI"
Custom timeout and retries:
client = OMOPClient(
"http://your-webapi:8080/WebAPI",
timeout=60,
max_retries=5,
verify_ssl=True
)
API Reference
OMOPClient
Core client for WebAPI interactions.
Methods:
get_sources()- List available data sourcesget_cohort(cohort_id)- Fetch cohort definitioncreate_cohort(definition)- Create new cohortgenerate_cohort(cohort_id, source_key)- Generate cohort on sourceget_generation_status(cohort_id, source_key)- Check generation statusget_cohort_results(cohort_id, source_key)- Fetch cohort summaryget_heracles_analyses(cohort_id, source_key)- Fetch Heracles analysesrun_heracles(cohort_id, source_key)- Run characterizationget_concept_set(concept_set_id)- Fetch concept setresolve_concept_set(expression, source_key)- Resolve to concept IDs
CohortBuilder
Fluent interface for cohort definitions.
Methods:
with_condition(name, concept_ids)- Add condition criterionwith_age_range(min_age, max_age)- Set age requirementswith_gender(gender)- Set gender requirementwith_observation_window(prior_days, post_days)- Set observation windowbuild()- Generate cohort definition
Expression syntax support.
Methods:
add_concept_set(name)- Create concept setadd_primary_condition(concept_set_id)- Add condition criterionadd_primary_procedure(concept_set_id)- Add procedure criterionadd_primary_drug(concept_set_id)- Add drug criterionadd_primary_measurement(concept_set_id)- Add measurement criterionset_observation_window(prior_days, post_days)- Set observation windowset_primary_criteria_limit(limit_type)- Set limit type (All, First)add_inclusion_rule(name, description)- Add inclusion rulebuild()- Generate cohort definition
HeraclesJobManager
Heracles job management.
Methods:
create_job(cohort_ids, source_key, ...)- Create job configurationsubmit_job(job_config, poll, timeout)- Submit and optionally pollget_job_status(execution_id)- Get job status
MedSynthOMOPSource
CSV-based OMOP data source.
Methods:
load_table(table_name)- Load OMOP table from CSVget_person_count()- Get total personsget_condition_counts()- Get condition counts by concept IDfilter_by_condition(concept_ids)- Filter persons by conditionfilter_by_age_gender(person_ids, min_age, max_age, gender_concept_ids)- Apply demographicscreate_cohort_summary(concept_ids, min_age, max_age, gender_concept_ids)- Create summary
CohortVisualizer
Visualization generator.
Methods:
create_age_pyramid(age_data, save_path)- Age distribution by gendercreate_condition_treemap(condition_counts, condition_names, save_path)- Condition prevalencecreate_temporal_pattern(dates, save_path)- Cohort entry over timecreate_dashboard(cohort_data, save_path)- Comprehensive dashboard
High-Level Functions
fetch_cohort_data(base_url, cohort_id, source_key)- Complete cohort datafetch_concept_sets(base_url, concept_set_ids)- Multiple concept setscreate_and_generate_cohort(base_url, cohort_name, concept_ids, source_key, age_min)- Create and generatepoll_generation_status(base_url, cohort_id, source_key, max_wait)- Poll until completecreate_simple_cohort(name, description, concept_ids, include_descendants, age_min, age_max, genders)- Simple cohortcreate_standard_job(cohort_ids, source_key, job_name, ...)- Standard Heracles jobload_from_medsynth_directory(data_directory, concept_ids, min_age, max_age, gender_concept_ids)- Load from CSVcreate_cohort_visualizations(cohort_data, output_dir)- All visualizations
Testing
Run test suite:
pytest tests/ -v
Test results:
tests/test_client.py::test_client_initialization ......... PASSED
tests/test_client.py::test_context_manager ............... PASSED
tests/test_client.py::test_get_sources ................... PASSED
tests/test_client.py::test_get_cohort .................... PASSED
tests/test_client.py::test_get_generation_status ......... PASSED
tests/test_client.py::test_get_cohort_results ............ PASSED
tests/test_cohort.py::test_simple_cohort_builder ......... PASSED
tests/test_cohort.py::test_full_cohort_builder ........... PASSED
tests/test_cohort.py::test_create_simple_cohort .......... PASSED
tests/test_cohort.py::test_multiple_concept_sets ......... PASSED
tests/test_cohort.py::test_age_operators ................. PASSED
tests/test_error_handling.py::test_invalid_cohort_id ..... PASSED
tests/test_error_handling.py::test_nonexistent_cohort .... PASSED
tests/test_error_handling.py::test_invalid_source_key .... PASSED
tests/test_error_handling.py::test_nonexistent_source .... PASSED
tests/test_error_handling.py::test_cohort_not_generated .. PASSED
tests/test_error_handling.py::test_invalid_cohort_definition PASSED
tests/test_error_handling.py::test_empty_concept_set ..... PASSED
tests/test_error_handling.py::test_no_primary_criteria ... PASSED
tests/test_error_handling.py::test_invalid_concept_id .... PASSED
tests/test_error_handling.py::test_medsynth_invalid_directory PASSED
tests/test_error_handling.py::test_medsynth_invalid_table . PASSED
tests/test_error_handling.py::test_medsynth_missing_table_file PASSED
tests/test_heracles.py::test_heracles_job_config ......... PASSED
tests/test_heracles.py::test_analysis_builder ............ PASSED
tests/test_heracles.py::test_analysis_categories ......... PASSED
tests/test_heracles.py::test_create_standard_job ......... PASSED
tests/test_heracles.py::test_custom_analyses ............. PASSED
tests/test_heracles.py::test_comprehensive_analyses ...... PASSED
29 passed in 6.91s
Tested against OHDSI WebAPI 2.14.0 with KAGGLECOPD and SYNPUF1K data sources.
Requirements
- Python 3.9+
- OHDSI WebAPI instance (v2.7+) or MedSynth CSV data
- Network access to WebAPI endpoint (if using WebAPI)
Development
git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop
pip install -e ".[dev]"
pytest
mypy src/smart_omop
black src/smart_omop
Author
Ankur Lohachab Department of Advanced Computing Sciences Maastricht University
License
MIT License - see LICENSE file.
Citation
@software{lohachab2025smartomop,
author = {Lohachab, Ankur},
title = {smart-omop: OHDSI OMOP CDM Data Fetching for Healthcare AI},
year = {2025},
url = {https://github.com/ankurlohachab/smart-omop}
}
Support
Issues: https://github.com/ankurlohachab/smart-omop/issues Email: ankur.lohachab@maastrichtuniversity.nl
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_omop-2.0.0.tar.gz.
File metadata
- Download URL: smart_omop-2.0.0.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aea67fc777ef9bf9f2c0f671f324689b38471e1ee4cf80facf17fdf5546d986d
|
|
| MD5 |
c44cb30de602eb814b3cdf2baafa9b10
|
|
| BLAKE2b-256 |
005f071dbf9eb5f99e51afc077174219692794325ae54243821ec243783aa157
|
File details
Details for the file smart_omop-2.0.0-py3-none-any.whl.
File metadata
- Download URL: smart_omop-2.0.0-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47c65cecba8373b8e40e941dbfaf976f2d6a7117cd16b7e66bc1aa18c9212caa
|
|
| MD5 |
109b9b8d0d33ec0b82574bd16c9f77c5
|
|
| BLAKE2b-256 |
638dd3b13187a4e3034be4308d05bb9d3577ea3a5830686e1d5eb51ec9884c5a
|