Pythonic, type-safe interface for OMOP CDM databases
Project description
OMOPy
Pythonic, type-safe interface for OMOP CDM databases.
OMOPy is a single Python package that reimplements the DARWIN EU R package ecosystem for working with OMOP Common Data Model databases. It provides lazy database access via Ibis, type-safe data structures via Pydantic and Polars, and a clean Pythonic API with full type hints.
Background
The DARWIN EU Coordination Centre develops and maintains an ecosystem of R packages for observational health research using the OMOP CDM. These include CDMConnector, PatientProfiles, CohortCharacteristics, IncidencePrevalence, DrugUtilisation, CohortSurvival, TreatmentPatterns, and others.
OMOPy consolidates these ~17 R packages into a single Python library, bringing the DARWIN EU analytical toolkit to the Python data science ecosystem. The package preserves the conceptual model and analytical capabilities of the R packages while providing a Pythonic API that follows Python conventions and leverages modern Python tooling.
Features
- Single package — one
pip install omopyreplaces 17 R packages - Lazy by default — Ibis constructs SQL queries; nothing executes until
you call
.collect() - Type-safe — Pydantic models with frozen immutability; full type annotations throughout
- Pythonic — snake_case, context managers, keyword arguments, no R idioms
- Database-agnostic — DuckDB, PostgreSQL, SQL Server, Snowflake, BigQuery, and more via Ibis backends
Modules
| Module | R Equivalent | Description |
|---|---|---|
omopy.generics |
omopgenerics | Core type system: CDM references, tables, codelists, summarised results |
omopy.connector |
CDMConnector | Database connections, CDM loading, cohort generation, CIRCE engine |
omopy.profiles |
PatientProfiles | Patient-level enrichment: demographics, intersections, death |
omopy.codelist |
CodelistGenerator | Vocabulary search, hierarchy traversal, codelist operations |
omopy.vis |
visOmopResults | Format, tabulate, and plot summarised results |
omopy.characteristics |
CohortCharacteristics | Cohort characterization: summarise, tabulate, plot |
omopy.incidence |
IncidencePrevalence | Incidence rates and prevalence proportions |
omopy.drug |
DrugUtilisation | Drug cohort generation, utilisation metrics, dose analysis |
omopy.survival |
CohortSurvival | Kaplan-Meier survival, competing risks, survival plots |
omopy.treatment |
TreatmentPatterns | Treatment pathway analysis, Sankey and sunburst plots |
omopy.drug_diagnostics |
DrugExposureDiagnostics | Drug exposure quality checks and diagnostics |
omopy.pregnancy |
PregnancyIdentifier | Pregnancy episode identification (HIPPS algorithm) |
omopy.testing |
TestGenerator | Test data generation for OMOP CDM studies |
Installation
pip install omopy
Or with uv:
uv add omopy
Requirements
- Python >= 3.14
- A database with OMOP CDM v5.3 or v5.4 tables
Optional database backends
pip install omopy[postgres] # PostgreSQL via psycopg
pip install omopy[mssql] # SQL Server via pyodbc
pip install omopy[snowflake] # Snowflake
pip install omopy[bigquery] # Google BigQuery
pip install omopy[all] # All backends
Quick Start
from omopy.connector import cdm_from_con, generate_concept_cohort_set
from omopy.generics import Codelist
# Connect to a DuckDB OMOP CDM database
cdm = cdm_from_con("path/to/omop.duckdb", cdm_schema="main")
# Define a concept-based cohort
codelist = Codelist({"hypertension": [320128]})
cdm = generate_concept_cohort_set(cdm, codelist, name="hypertension")
# Enrich with demographics
from omopy.profiles import add_demographics
enriched = add_demographics(cdm["hypertension"], cdm)
# Collect to a Polars DataFrame
df = enriched.collect()
print(df)
# Characterise the cohort
from omopy.characteristics import summarise_characteristics, table_characteristics
result = summarise_characteristics(cdm["hypertension"])
table_characteristics(result, type="gt")
# Estimate incidence
from omopy.incidence import (
generate_denominator_cohort_set,
estimate_incidence,
plot_incidence,
)
cdm = generate_denominator_cohort_set(cdm, name="denominator")
inc = estimate_incidence(cdm, "denominator", "hypertension", interval="years")
plot_incidence(inc)
Notebooks
The notebooks/ directory contains 12 fully executable Jupyter notebooks
demonstrating every major capability:
| Notebook | Topic |
|---|---|
01_getting_started |
CDM connection, tables, snapshot, subsetting |
02_codelist_generation |
Vocabulary search, hierarchy, codelist operations |
03_cohort_generation |
Concept-based and CIRCE/JSON cohort generation |
04_cohort_characteristics |
Summarise, tabulate, and plot cohort characteristics |
05_patient_profiles |
Demographics, intersects, categories, enrichment |
06_incidence_prevalence |
Denominator generation, incidence and prevalence |
07_drug_utilisation |
Drug cohorts, utilisation metrics, indication |
08_cohort_survival |
Single-event and competing-risk survival analysis |
09_treatment_patterns |
Treatment pathways, Sankey and sunburst plots |
10_drug_exposure_diagnostics |
Drug exposure quality checks |
11_pregnancy_analysis |
HIPPS pregnancy identification algorithm |
12_visualisation_styling |
Table and plot formatting, styles |
Documentation
Full API documentation is available at darwin-eu.github.io/omopy.
Development
# Clone and install
git clone https://github.com/fastomop/omopy.git
cd omopy
uv sync --all-extras --dev
# Run tests (1619 tests)
uv run pytest
# Run linting
uv run ruff check src/ tests/
# Build documentation
uv run python docs/_build.py build --strict
# Install pre-commit hooks
uv run pre-commit install
See CONTRIBUTING.md for the full development guide, including the code style guide, type hint requirements, and pull request process.
About DARWIN EU
DARWIN EU (Data Analysis and Real-World Interrogation Network) is a federated network of data, expertise, and services for generating reliable evidence on the real-world safety and effectiveness of medicines. It is coordinated by the DARWIN EU Coordination Centre and supports regulatory decision-making by the European Medicines Agency (EMA).
OMOPy builds on the analytical methods and tooling developed by the DARWIN EU community and the broader OHDSI (Observational Health Data Sciences and Informatics) network. The OMOP Common Data Model provides the standardised data structure that underpins all analyses.
License
This project is licensed under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omopy-0.1.0.tar.gz.
File metadata
- Download URL: omopy-0.1.0.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
643f20b4ae7dd06fb74cbfdbf4477e390a9b0047737a137e0b9543bbd8b3f707
|
|
| MD5 |
8a4b939af7115330492595713a6f1217
|
|
| BLAKE2b-256 |
6b87b6156f85262f689c7b69fe34525c5d0bb43e150e0d868e1305f7f7d1f4fb
|
File details
Details for the file omopy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: omopy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 345.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78551fbbb8dd9068121a284a8aa26fc144f791f6af4805eafc0b622d98ebe8ce
|
|
| MD5 |
fed3bc1ad3f3b9979d537d6de1519716
|
|
| BLAKE2b-256 |
f61aaf5b31894c6e0e30f155126cf28fc24ff34e666a1b2ae07b6e02f507f0ed
|