Skip to main content

OMOP CDM utils in Python

Project description

pyomop: OMOP Swiss Army Knife 🔧

Release Build status codecov Commit activity License Downloads Documentation

Generated by github-dependents-info

✨ Overview

pyomop is your OMOP Swiss Army Knife 🔧 for working with OHDSI OMOP Common Data Model (CDM) v5.4 or v6 compliant databases using SQLAlchemy as the ORM. It supports converting query results to pandas DataFrames for machine learning pipelines and provides utilities for working with OMOP vocabularies. Table definitions are based on the omop-cdm library. Pyomop is designed to be a lightweight, easy-to-use library for researchers and developers experimenting and testing with OMOP CDM databases. It can be used both as a commandline tool and as an imported library in your code.

  • Supports SQLite, PostgreSQL, and MySQL. CDM and Vocab tables are created in the same schema. (See usage below for more details)
  • LLM-based natural language queries via langchain. Usage.
  • 🔥 FHIR to OMOP conversion utilities. (See usage below for more details)
  • Execute QueryLibrary. (See usage below for more details)

Please ⭐️ If you find this project useful!

Installation

Stable release:

pip install pyomop

Development version:

git clone https://github.com/dermatologist/pyomop.git
cd pyomop
pip install -e .

LLM support:

pip install pyomop[llm]

✨ See this notebook or script for examples. 👇 MCP SERVER is recommended for advanced usage.

Docker

  • A docker-compose is provided to quickly set up an environment with postgrs, webapi, atlas and a sql script to create a source in webapi. The script can be run using the psql command line tool or via the webapi UI. Please refresh after running the script by sending a request to /WebAPI/source/refresh.

🔧 Usage

import asyncio
import datetime

from sqlalchemy import select

from pyomop import CdmEngineFactory, CdmVector, CdmVocabulary
# cdm6 and cdm54 are supported
from pyomop.cdm54 import Base, Cohort, Person, Vocabulary

async def main():
    cdm = CdmEngineFactory() # Creates SQLite database by default for fast testing
    # cdm = CdmEngineFactory(db='pgsql', host='', port=5432,
    #                       user='', pw='',
    #                       name='', schema='')
    # cdm = CdmEngineFactory(db='mysql', host='', port=3306,
    #                       user='', pw='',
    #                       name='')
    engine = cdm.engine
    # Comment the following line if using an existing database. Both cdm6 and cdm54 are supported, see the import statements above
    await cdm.init_models(Base.metadata) # Initializes the database with the OMOP CDM tables
    vocab = CdmVocabulary(cdm, version='cdm54') # or 'cdm6' for v6
    # Uncomment the following line to create a new vocabulary from CSV files
    # vocab.create_vocab('/path/to/csv/files')

    async with cdm.session() as session:  # type: ignore
        # Add Persons
        async with session.begin():
            session.add(
                Person(
                    person_id=100,
                    gender_concept_id=8532,
                    gender_source_concept_id=8512,
                    year_of_birth=1980,
                    month_of_birth=1,
                    day_of_birth=1,
                    birth_datetime=datetime.datetime(1980, 1, 1),
                    race_concept_id=8552,
                    race_source_concept_id=8552,
                    ethnicity_concept_id=38003564,
                    ethnicity_source_concept_id=38003564,
                )
            )
            session.add(
                Person(
                    person_id=101,
                    gender_concept_id=8532,
                    gender_source_concept_id=8512,
                    year_of_birth=1980,
                    month_of_birth=1,
                    day_of_birth=1,
                    birth_datetime=datetime.datetime(1980, 1, 1),
                    race_concept_id=8552,
                    race_source_concept_id=8552,
                    ethnicity_concept_id=38003564,
                    ethnicity_source_concept_id=38003564,
                )
            )

        # Query the Person
        stmt = select(Person).where(Person.person_id == 100)
        result = await session.execute(stmt)
        for row in result.scalars():
            print(row)
            assert row.person_id == 100

        # Query the person pattern 2
        person = await session.get(Person, 100)
        print(person)
        assert person is not None
        assert person.person_id == 100

    # Convert result to a pandas dataframe
    vec = CdmVector()

    # https://github.com/OHDSI/QueryLibrary/blob/master/inst/shinyApps/QueryLibrary/queries/person/PE02.md
    result = await vec.query_library(cdm, resource='person', query_name='PE02')
    df = vec.result_to_df(result)
    print("DataFrame from result:")
    print(df.head())

    result = await vec.execute(cdm, query='SELECT * from person;')
    print("Executing custom query:")
    df = vec.result_to_df(result)
    print("DataFrame from result:")
    print(df.head())

    # Close engine
    await engine.dispose() # type: ignore

# Run the main function
asyncio.run(main())

🔥 FHIR to OMOP mapping

pyomop can load FHIR Bulk Export (NDJSON) files into an OMOP CDM database.

Run:

pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/

This will create an OMOP CDM in SQLite, load the vocabulary files, and import the FHIR data from the input folder and reconcile vocabulary, mapping source_value to concept_id. The mapping is defined in the mapping.example.json file. The default mapping is here. Mapping happens in 5 steps as implemented here.

  • Example using postgres (Docker)
pyomop --dbtype pgsql --host localhost --user postgres --pw mypass  --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/
  • FHIR to data frame mapping is done with FHIRy
  • Most of the code for this functionality was written by an LLM agent. The prompts used are here

Command-line

  -c, --create                Create CDM tables (see --version).
  -t, --dbtype TEXT           Database Type for creating CDM (sqlite, mysql or
                              pgsql)
  -h, --host TEXT             Database host
  -p, --port TEXT             Database port
  -u, --user TEXT             Database user
  -w, --pw TEXT               Database password
  -v, --version TEXT          CDM version (cdm54 (default) or cdm6)
  -n, --name TEXT             Database name
  -s, --schema TEXT           Database schema (for pgsql)
  -i, --vocab TEXT            Folder with vocabulary files (csv) to import
  -f, --input DIRECTORY       Input folder with FHIR bundles or ndjson files.
  -e, --eunomia-dataset TEXT  Download and load Eunomia dataset (e.g.,
                              'GiBleed', 'Synthea')
  --eunomia-path TEXT         Path to store/find Eunomia datasets (uses
                              EUNOMIA_DATA_FOLDER env var if not specified)
  --connection-info           Display connection information for the database (For R package compatibility)
  --mcp-server                Start MCP server for stdio interaction
  --pyhealth-path TEXT        Path to export PyHealth compatible CSV files
  --help                      Show this message and exit.

MCP Server

pyomop includes an MCP (Model Context Protocol) server that exposes tools for interacting with OMOP CDM databases. This allows MCP clients to create databases, load data, and execute SQL statements.

Usage with MCP Clients

The MCP server can be used with any MCP-compatible client such as Claude desktop. Example configuration for VSCODE as below is already provided in the repository. So if you are viewing this in VSCODE, you can start server and enable tools directly in Copilot.

{
  "servers": {
      "pyomop": {
      "command": "uv",
      "args": ["run", "pyomop", "--mcp-server"]
    }
  }
}
  • If the vocabulary is not installed locally or advanced vocabulary support is required from Athena, it is recommended to combine omop_mcp with PyOMOP.

Available MCP Tools

  • create_cdm: Create an empty CDM database
  • create_eunomia: Add Eunomia sample dataset
  • get_table_columns: Get column names for a specific table
  • get_single_table_info: Get detailed table information, including foreign keys
  • get_usable_table_names: Get a list of all available table names
  • run_sql: Execute SQL statements with error handling
  • example_query: Get example queries for specific OMOP CDM tables from OHDSI QueryLibrary
  • check_sql: Validate SQL query syntax before execution
  • create_cdm and create_eunomia support only local sqlite databases to avoid inadvertent data loss in production databases.

HTTP Transport Support

The MCP server now supports both stdio (default) and HTTP transports:

Stdio transport (default):

pyomop --mcp-server
# or
pyomop-mcp-server

HTTP transport:

pyomop-mcp-server-http
# or with custom host/port
pyomop-mcp-server-http --host 0.0.0.0 --port 8000
# or via Python module
python -m pyomop.mcp.server --http --host 0.0.0.0 --port 8000

To use HTTP transport, install additional dependencies:

pip install pyomop[http]
# or for both LLM and HTTP features
pip install pyomop[llm,http]

Available Prompts

  • query_execution_steps: Provides step-by-step guidance for executing database queries based on free text instructions

Eunomia import and cohort creation

pyomop -e Synthea27Nj -v 5.4 --connection-info
pyomop -e GiBleed -v 5.3 --connection-info

PyHealth and PLP Compatibility (For Machine Learning pipelines)

pyomop supports exporting OMOP CDM data (to --pyhealth-path) in a format compatible with PyHealth, a machine learning library for healthcare data analysis (See Notebook and usage below). Additionally, you can export the connection information for use with the various R packages such as PatientLevelPrediction using the --connection-info option.

pyomop -e GiBleed -v 5.3 --connection-info --pyhealth-path ~/pyhealth

Additional Tools

  • Convert FHIR to pandas DataFrame: fhiry
  • .NET and Golang OMOP CDM: .NET, Golang

Supported Databases

  • PostgreSQL
  • MySQL
  • SQLite

Environment Variables for Database Connection

You can configure database connection parameters using environment variables. These will be used as defaults by pyomop and the MCP server:

  • PYOMOP_DB: Database type (sqlite, mysql, pgsql)
  • PYOMOP_HOST: Database host
  • PYOMOP_PORT: Database port
  • PYOMOP_USER: Database user
  • PYOMOP_PW: Database password
  • PYOMOP_SCHEMA: Database schema (for PostgreSQL)

Example usage:

export PYOMOP_DB=pgsql
export PYOMOP_HOST=localhost
export PYOMOP_PORT=5432
export PYOMOP_USER=postgres
export PYOMOP_PW=mypass
export PYOMOP_SCHEMA=omop

These environment variables will be checked before assigning default values for database connection in pyomop and MCP server tools.

🗄️ Agent Assisted ETL - Work in progress

Use --migrate to run the generic loader from the command line. Provide source-database connection details with --src-* options; target-database details use the standard --dbtype / --host / … options.

# SQLite source → SQLite OMOP target
pyomop-migrate --migrate \
  --src-dbtype sqlite --src-name source.sqlite \
  --dbtype sqlite --name omop.sqlite \
  --mapping mapping.json

# PostgreSQL source → PostgreSQL OMOP target
pyomop-migrate --migrate \
  --src-dbtype pgsql --src-host srchost --src-user reader --src-pw secret --src-name ehr \
  --dbtype pgsql --host omophost --user writer --pw secret --name omop \
  --mapping ehr_to_omop.json --batch-size 500

Source connection credentials can also be provided via environment variables (SRC_DB_HOST, SRC_DB_PORT, SRC_DB_USER, SRC_DB_PASSWORD, SRC_DB_NAME) to avoid exposing passwords in the shell history.

Schema extraction

Use --extract-schema to generate a Markdown document describing the source database schema (tables, columns, types, PK/FK relationships). This is especially useful for feeding to an AI agent to generate the mapping JSON.

pyomop-migrate --extract-schema \
  --src-dbtype sqlite --src-name source.sqlite \
  --schema-output schema.md

The same SRC_DB_* environment variables are supported for credentials.

Plan

  • Use the extracted schema to generate a mapping JSON using an appropriate agentic skill.

See the bundled example mapping and the full documentation for all supported options.

Contributing

Pull requests are welcome! See CONTRIBUTING.md.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyomop-6.4.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyomop-6.4.0-py3-none-any.whl (89.5 kB view details)

Uploaded Python 3

File details

Details for the file pyomop-6.4.0.tar.gz.

File metadata

  • Download URL: pyomop-6.4.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyomop-6.4.0.tar.gz
Algorithm Hash digest
SHA256 1930f2d2525a5ce936f44ec8017f94f1a0947719b2a20ac3c440717f31da34ee
MD5 0fa5c3b94baa291a11a004155252dc6a
BLAKE2b-256 4cd36b0e6e398cc5ca25cebd0f4c6cd4bb66f2df3046c7d2fd86a3f682535cdd

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyomop-6.4.0.tar.gz:

Publisher: publish.yml on dermatologist/pyomop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyomop-6.4.0-py3-none-any.whl.

File metadata

  • Download URL: pyomop-6.4.0-py3-none-any.whl
  • Upload date:
  • Size: 89.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyomop-6.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3089b0c77ca3b1cf499d169b1e3caa70530572af5ca3d70eeae7d20f31183c9e
MD5 418b8aa3b1f06cef567a2090e5e1d32f
BLAKE2b-256 334bf28074b15788b142fd4de1cbd30619e06bf54dacba95b406ed2f487ead5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyomop-6.4.0-py3-none-any.whl:

Publisher: publish.yml on dermatologist/pyomop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page