OHDSI OMOP CDM data fetching and cohort management for healthcare AI
Project description
smart-omop
Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.
Installation
pip install smart-omop
Optional dependencies:
pip install smart-omop[viz] # Plotly visualizations
pip install smart-omop[medsynth] # MedSynth CSV integration
pip install smart-omop[all] # All features
Quick Start
from smart_omop import OMOPClient
client = OMOPClient("http://your-server:8080/WebAPI")
# List data sources
sources = client.get_sources()
# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")
# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")
Creating Cohorts
Basic Cohort Builder
from smart_omop import CohortBuilder, Gender, OMOPClient
builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573]) # COPD concept ID
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)
cohort_def = builder.build()
with OMOPClient("http://your-server:8080/WebAPI") as client:
created = client.create_cohort(cohort_def.to_dict())
print(f"Created cohort ID: {created['id']}")
Multi-Criteria Cohort Builder
from smart_omop import CohortBuilderFull, Gender, AgeOperator
builder = CohortBuilderFull("COPD with Hypertension", "Multi-condition cohort")
# Add concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)
htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)
# Set primary criterion
builder.add_primary_condition(concept_set_id=0)
# Add inclusion rule for demographics
rule = builder.add_inclusion_rule("Age and Gender")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)
# Build and create
cohort_def = builder.build()
created = client.create_cohort(cohort_def.to_dict())
Supported criteria types: ConditionOccurrence, ProcedureOccurrence, DrugExposure, Measurement, Observation, VisitOccurrence, DeviceExposure, Death.
Heracles Characterization
Examples below use the demo cohort ID
167and source keyKAGGLECOPD. Replace both with your own values from your WebAPI/Atlas instance.
Running Analysis
from smart_omop import HeraclesJobManager, OMOPClient
with OMOPClient("http://your-server:8080/WebAPI") as client:
manager = HeraclesJobManager(client)
# Create job
job = manager.create_job(
cohort_ids=[167],
source_key="KAGGLECOPD",
job_name="COPD_analysis",
analysis_ids=[1, 2, 3, 4, 5, 400, 401, 402, 403, 404],
small_cell_count=5
)
# Submit (no polling - check status separately)
result = manager.submit_job(job)
print(f"Job submitted: {result.get('executionId')}")
Fetching Reports
After Heracles analysis completes, fetch characterization reports:
# Get person demographics
person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)
# Get condition occurrences
condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)
# Get drug exposures
drug_report = client.get_heracles_drug_report(167, "KAGGLECOPD", refresh=True)
# Get procedures
procedure_report = client.get_heracles_procedure_report(167, "KAGGLECOPD", refresh=True)
# Get measurements
measurement_report = client.get_heracles_measurement_report(167, "KAGGLECOPD", refresh=True)
# Get dashboard summary
dashboard = client.get_heracles_dashboard_report(167, "KAGGLECOPD", refresh=True)
Available report types:
get_heracles_person_report()- Demographics and year of birthget_heracles_condition_report()- Condition occurrencesget_heracles_drug_report()- Drug exposuresget_heracles_procedure_report()- Proceduresget_heracles_measurement_report()- Measurementsget_heracles_observation_report()- Observationsget_heracles_death_report()- Death recordsget_heracles_dashboard_report()- Summary statistics
Example: Person Report Data
person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)
# Gender distribution
for gender in person_report['gender']:
print(f"{gender['conceptName']}: {gender['countValue']} persons")
# Output:
# MALE: 61 persons
# FEMALE: 34 persons
# Year of birth distribution
birth_years = person_report['yearOfBirth']
print(f"Birth year entries: {len(birth_years)}")
# Output: Birth year entries: 33
# Birth year statistics
stats = person_report['yearOfBirthStats'][0]
print(f"Year range: {stats['minValue']} to {stats['maxValue']}")
# Output: Year range: 1933 to 1977
Example: Condition Report Data
condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)
# Top conditions by prevalence
for condition in condition_report[:5]:
concept_id = condition['conceptId']
name = condition['conceptPath'].split('||')[-1]
num_persons = condition['numPersons']
percent = condition['percentPersons']
print(f"{name} ({concept_id}): {num_persons} persons ({percent:.1%})")
# Output:
# Chronic obstructive pulmonary disease (255573): 95 persons (100.0%)
# Moderate chronic obstructive pulmonary disease (4193588): 39 persons (41.1%)
# Severe chronic obstructive pulmonary disease (4209097): 27 persons (28.4%)
# Mild chronic obstructive pulmonary disease (4196712): 21 persons (22.1%)
Visualizations
Visualization examples use the demo cohort ID
167and source keyKAGGLECOPD; swap these for your own values.
Create visualizations from Heracles reports:
from smart_omop import CohortVisualizer
import json
# Load report data
with open('cohort167_KAGGLECOPD_person.json') as f:
person_data = json.load(f)
with open('cohort167_KAGGLECOPD_condition.json') as f:
condition_data = json.load(f)
visualizer = CohortVisualizer(output_dir="./visualizations")
# Age distribution from Heracles person report
age_by_gender = visualizer.create_age_distribution(person_data)
# Gender distribution
gender_chart = visualizer.create_gender_distribution(person_data)
# Condition prevalence treemap
condition_treemap = visualizer.create_condition_prevalence(condition_data)
# Dashboard with multiple charts
dashboard = visualizer.create_dashboard_from_reports({
'person': person_data,
'condition': condition_data
})
Outputs interactive HTML files using Plotly.
CLI Usage
Creating and Generating Cohorts
# Create cohort
smart-omop --base-url http://your-server:8080/WebAPI create-cohort \
--name "COPD Patients" \
--concept-ids 255573 \
--age-gte 40 \
--output cohort.json
# Generate cohort
smart-omop --base-url http://your-server:8080/WebAPI generate \
--cohort-id 167 \
--source-key KAGGLECOPD
# Check results
smart-omop --base-url http://your-server:8080/WebAPI results \
--cohort-id 167 \
--source-key KAGGLECOPD
Running Heracles and Fetching Reports
# Run Heracles analysis
smart-omop --base-url http://your-server:8080/WebAPI heracles \
--cohort-id 167 \
--source-key KAGGLECOPD \
--job-name "COPD_analysis" \
--analysis-ids "1,2,3,4,5,400,401,402,403,404"
# Get individual reports (wait ~60 seconds after Heracles starts)
# Person demographics
smart-omop --base-url http://your-server:8080/WebAPI get-report \
--cohort-id 167 \
--source-key KAGGLECOPD \
--type person \
--output person.json \
--refresh
# Dashboard
smart-omop --base-url http://your-server:8080/WebAPI get-report \
--cohort-id 167 \
--source-key KAGGLECOPD \
--type dashboard \
--refresh
# Condition occurrences
smart-omop --base-url http://your-server:8080/WebAPI get-report \
--cohort-id 167 \
--source-key KAGGLECOPD \
--type condition \
--output condition.json \
--refresh
# Available types: dashboard, person, condition, drug, procedure,
# measurement, observation, death, components_summary
# Or export all reports at once
smart-omop --base-url http://your-server:8080/WebAPI export-reports \
--cohort-id 167 \
--source-key KAGGLECOPD \
--output-dir ./reports \
--refresh
Notes:
167andKAGGLECOPDare example values from the bundled COPD demo. Replace them with your own cohort ID andsource_keyreturned by your WebAPI/Atlas instance.- The
--refreshflag forces a fresh pull from WebAPI; drop it if you are reusing cached reports.
Analysis IDs
Common Heracles analysis categories:
| Category | IDs | Description |
|---|---|---|
| Demographics | 1-5, 7-9 | Person count, age, gender, race |
| Conditions | 400-413 | Condition occurrences and prevalence |
| Drugs | 700-713 | Drug exposures and durations |
| Procedures | 600-613 | Procedure occurrences |
| Measurements | 1800-1831 | Measurement values and distributions |
| Observations | 800-813 | Observation records |
| Visits | 200-213 | Visit occurrences and types |
Example analysis selection:
# Demographics and conditions
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 403, 404]
# Demographics, conditions, and measurements
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 1800, 1801, 1802, 1803]
Or use predefined sets:
from smart_omop import DEMO_ANALYSES, CONDITION_ANALYSES, MEASUREMENT_ANALYSES
analysis_ids = DEMO_ANALYSES + CONDITION_ANALYSES + MEASUREMENT_ANALYSES
MedSynth Integration
MedSynth generates synthetic OMOP CDM data for privacy-preserving research.
Installation:
pip install medsynth
Generate synthetic data:
medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/
Load and analyze:
from smart_omop import MedSynthOMOPSource
source = MedSynthOMOPSource("./omop_data")
# Filter by condition
persons = source.filter_by_condition([255573]) # COPD
# Apply demographics
filtered = source.filter_by_age_gender(
persons,
min_age=60,
gender_concept_ids=[8532] # Female
)
# Create summary
summary = source.create_cohort_summary(
concept_ids=[255573],
min_age=60,
gender_concept_ids=[8532]
)
print(f"Matching persons: {summary['person_count']}")
For more information: https://github.com/ankurlohachab/medsynth
API Reference
OMOPClient
Core methods:
get_sources()- List available data sourcesget_cohort(cohort_id)- Retrieve cohort definitioncreate_cohort(definition)- Create new cohortgenerate_cohort(cohort_id, source_key)- Generate cohortget_generation_status(cohort_id, source_key)- Check generation statusget_cohort_results(cohort_id, source_key)- Get cohort summaryget_heracles_report(cohort_id, source_key, report_type, refresh)- Get specific reportget_heracles_person_report(cohort_id, source_key, refresh)- Get demographicsget_heracles_condition_report(cohort_id, source_key, refresh)- Get conditionsget_heracles_drug_report(cohort_id, source_key, refresh)- Get drug exposuresget_heracles_procedure_report(cohort_id, source_key, refresh)- Get proceduresget_heracles_measurement_report(cohort_id, source_key, refresh)- Get measurementsget_heracles_dashboard_report(cohort_id, source_key, refresh)- Get dashboard
HeraclesJobManager
Methods:
create_job(cohort_ids, source_key, job_name, analysis_ids, small_cell_count)- Create jobsubmit_job(job_config, poll, timeout)- Submit jobget_job_status(execution_id)- Check job status
CohortBuilder
Methods:
with_condition(name, concept_ids)- Add condition criterionwith_age_range(min_age, max_age)- Set age requirementswith_gender(gender)- Set gender requirementbuild()- Generate cohort definition
CohortVisualizer
Methods:
create_age_distribution(person_report)- Age distribution from Heraclescreate_gender_distribution(person_report)- Gender breakdowncreate_condition_prevalence(condition_report)- Condition treemapcreate_dashboard_from_reports(reports)- Multi-panel dashboard
Configuration
Custom timeout and retries:
client = OMOPClient(
"http://your-server:8080/WebAPI",
timeout=60,
max_retries=5,
verify_ssl=True
)
Environment variable:
export OMOP_BASE_URL="http://your-server:8080/WebAPI"
Testing
Run tests:
pytest tests/ -v
Example test output:
tests/test_client.py::test_client_initialization PASSED
tests/test_cohort.py::test_simple_cohort_builder PASSED
tests/test_heracles.py::test_heracles_job_config PASSED
tests/test_error_handling.py::test_invalid_cohort_id PASSED
29 passed in 6.91s
Tested against OHDSI WebAPI 2.14.0.
Requirements
- Python 3.9+
- OHDSI WebAPI instance (v2.7+) or MedSynth CSV data
Examples
See examples/ directory:
example_quickstart.py- Basic operationsexample_simple_cohort.py- Cohort buildingexample_heracles.py- Characterizationexample_heracles_reports.py- Report fetchingexample_visualizations.py- Visualizationsexample_medsynth.py- Synthetic data
Development
git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop
pip install -e ".[dev]"
pytest
mypy src/smart_omop
black src/smart_omop
Author
Ankur Lohachab, Department of Advanced Computing Sciences, Maastricht University
License
MIT License - see LICENSE file.
Citation
@software{lohachab2025smartomop,
author = {Lohachab, Ankur},
title = {smart-omop: OHDSI OMOP CDM Client for Python},
year = {2025},
url = {https://github.com/ankurlohachab/smart-omop}
}
Support
Issues: https://github.com/ankurlohachab/smart-omop/issues Email: ankur.lohachab@maastrichtuniversity.nl
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_omop-2.0.3.tar.gz.
File metadata
- Download URL: smart_omop-2.0.3.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2365f7794480947dfcd069bffd2331b0f4424038ef3ed7f3adfd3ec918e068b2
|
|
| MD5 |
8810e6d85ecb127d3280ff3529365136
|
|
| BLAKE2b-256 |
c6d37ec16b6b4edda8708b29ed98b2e8e7ec72fbc35eb0360f1f62dbcb679a18
|
File details
Details for the file smart_omop-2.0.3-py3-none-any.whl.
File metadata
- Download URL: smart_omop-2.0.3-py3-none-any.whl
- Upload date:
- Size: 32.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7f5441321d4a1b35029526e91733367dbd9541ab73e16149b7384ee50c53bb9
|
|
| MD5 |
21089b78abce83faa4ffef83c90ea11c
|
|
| BLAKE2b-256 |
0f361e6c2cd6e65a313fa6b6ae612c37daa412003499569eef5831105216b400
|