Skip to main content

Pythonic, type-safe interface for OMOP CDM databases

Project description

OMOPy

CI Docs PyPI Python License: MIT

Pythonic, type-safe interface for OMOP CDM databases.

OMOPy is a single Python package that reimplements the DARWIN EU R package ecosystem for working with OMOP Common Data Model databases. It provides lazy database access via Ibis, type-safe data structures via Pydantic and Polars, and a clean Pythonic API with full type hints.

Background

The DARWIN EU Coordination Centre develops and maintains an ecosystem of R packages for observational health research using the OMOP CDM. These include CDMConnector, PatientProfiles, CohortCharacteristics, IncidencePrevalence, DrugUtilisation, CohortSurvival, TreatmentPatterns, and others.

OMOPy consolidates these ~17 R packages into a single Python library, bringing the DARWIN EU analytical toolkit to the Python data science ecosystem. The package preserves the conceptual model and analytical capabilities of the R packages while providing a Pythonic API that follows Python conventions and leverages modern Python tooling.

Features

  • Single package — one pip install omopy replaces 17 R packages
  • Lazy by default — Ibis constructs SQL queries; nothing executes until you call .collect()
  • Type-safe — Pydantic models with frozen immutability; full type annotations throughout
  • Pythonic — snake_case, context managers, keyword arguments, no R idioms
  • Database-agnostic — DuckDB, PostgreSQL, SQL Server, Snowflake, BigQuery, and more via Ibis backends

Modules

Module R Equivalent Description
omopy.generics omopgenerics Core type system: CDM references, tables, codelists, summarised results
omopy.connector CDMConnector Database connections, CDM loading, cohort generation, CIRCE engine
omopy.profiles PatientProfiles Patient-level enrichment: demographics, intersections, death
omopy.codelist CodelistGenerator Vocabulary search, hierarchy traversal, codelist operations
omopy.vis visOmopResults Format, tabulate, and plot summarised results
omopy.characteristics CohortCharacteristics Cohort characterization: summarise, tabulate, plot
omopy.incidence IncidencePrevalence Incidence rates and prevalence proportions
omopy.drug DrugUtilisation Drug cohort generation, utilisation metrics, dose analysis
omopy.survival CohortSurvival Kaplan-Meier survival, competing risks, survival plots
omopy.treatment TreatmentPatterns Treatment pathway analysis, Sankey and sunburst plots
omopy.drug_diagnostics DrugExposureDiagnostics Drug exposure quality checks and diagnostics
omopy.pregnancy PregnancyIdentifier Pregnancy episode identification (HIPPS algorithm)
omopy.testing TestGenerator Test data generation for OMOP CDM studies

Installation

pip install omopy

Or with uv:

uv add omopy

Requirements

  • Python >= 3.14
  • A database with OMOP CDM v5.3 or v5.4 tables

Optional database backends

pip install omopy[postgres]    # PostgreSQL via psycopg
pip install omopy[mssql]       # SQL Server via pyodbc
pip install omopy[snowflake]   # Snowflake
pip install omopy[bigquery]    # Google BigQuery
pip install omopy[all]         # All backends

Quick Start

from omopy.connector import cdm_from_con, generate_concept_cohort_set
from omopy.generics import Codelist

# Connect to a DuckDB OMOP CDM database
cdm = cdm_from_con("path/to/omop.duckdb", cdm_schema="main")

# Define a concept-based cohort
codelist = Codelist({"hypertension": [320128]})
cdm = generate_concept_cohort_set(cdm, codelist, name="hypertension")

# Enrich with demographics
from omopy.profiles import add_demographics
enriched = add_demographics(cdm["hypertension"], cdm)

# Collect to a Polars DataFrame
df = enriched.collect()
print(df)
# Characterise the cohort
from omopy.characteristics import summarise_characteristics, table_characteristics

result = summarise_characteristics(cdm["hypertension"])
table_characteristics(result, type="gt")
# Estimate incidence
from omopy.incidence import (
    generate_denominator_cohort_set,
    estimate_incidence,
    plot_incidence,
)

cdm = generate_denominator_cohort_set(cdm, name="denominator")
inc = estimate_incidence(cdm, "denominator", "hypertension", interval="years")
plot_incidence(inc)

Notebooks

The notebooks/ directory contains 12 fully executable Jupyter notebooks demonstrating every major capability:

Notebook Topic
01_getting_started CDM connection, tables, snapshot, subsetting
02_codelist_generation Vocabulary search, hierarchy, codelist operations
03_cohort_generation Concept-based and CIRCE/JSON cohort generation
04_cohort_characteristics Summarise, tabulate, and plot cohort characteristics
05_patient_profiles Demographics, intersects, categories, enrichment
06_incidence_prevalence Denominator generation, incidence and prevalence
07_drug_utilisation Drug cohorts, utilisation metrics, indication
08_cohort_survival Single-event and competing-risk survival analysis
09_treatment_patterns Treatment pathways, Sankey and sunburst plots
10_drug_exposure_diagnostics Drug exposure quality checks
11_pregnancy_analysis HIPPS pregnancy identification algorithm
12_visualisation_styling Table and plot formatting, styles

Documentation

Full API documentation is available at darwin-eu.github.io/omopy.

Development

# Clone and install
git clone https://github.com/fastomop/omopy.git
cd omopy
uv sync --all-extras --dev

# Run tests (1619 tests)
uv run pytest

# Run linting
uv run ruff check src/ tests/

# Build documentation
uv run python docs/_build.py build --strict

# Install pre-commit hooks
uv run pre-commit install

See CONTRIBUTING.md for the full development guide, including the code style guide, type hint requirements, and pull request process.

About DARWIN EU

DARWIN EU (Data Analysis and Real-World Interrogation Network) is a federated network of data, expertise, and services for generating reliable evidence on the real-world safety and effectiveness of medicines. It is coordinated by the DARWIN EU Coordination Centre and supports regulatory decision-making by the European Medicines Agency (EMA).

OMOPy builds on the analytical methods and tooling developed by the DARWIN EU community and the broader OHDSI (Observational Health Data Sciences and Informatics) network. The OMOP Common Data Model provides the standardised data structure that underpins all analyses.

License

This project is licensed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omopy-0.1.0.tar.gz (4.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omopy-0.1.0-py3-none-any.whl (345.5 kB view details)

Uploaded Python 3

File details

Details for the file omopy-0.1.0.tar.gz.

File metadata

  • Download URL: omopy-0.1.0.tar.gz
  • Upload date:
  • Size: 4.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omopy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 643f20b4ae7dd06fb74cbfdbf4477e390a9b0047737a137e0b9543bbd8b3f707
MD5 8a4b939af7115330492595713a6f1217
BLAKE2b-256 6b87b6156f85262f689c7b69fe34525c5d0bb43e150e0d868e1305f7f7d1f4fb

See more details on using hashes here.

File details

Details for the file omopy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: omopy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 345.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omopy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78551fbbb8dd9068121a284a8aa26fc144f791f6af4805eafc0b622d98ebe8ce
MD5 fed3bc1ad3f3b9979d537d6de1519716
BLAKE2b-256 f61aaf5b31894c6e0e30f155126cf28fc24ff34e666a1b2ae07b6e02f507f0ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page