Skip to main content

Get an OMOP CDM database running quickly.

Project description

omop-lite

A small container to get an OMOP CDM database running quickly, with support for both PostgreSQL and SQL Server.

Drop your data into data/, and run the container.

Environment Variables

You can configure the Docker container using the following environment variables:

  • DB_HOST: The hostname of the database. Default is db.
  • DB_PORT: The port number of the database. Default is 5432.
  • DB_USER: The username for the database. Default is postgres.
  • DB_PASSWORD: The password for the database. Default is password.
  • DB_NAME: The name of the database. Default is omop.
  • DIALECT: The type of database to use. Default is postgresql, but can also be mssql.
  • SCHEMA_NAME: The name of the schema to be created/used in the database. Default is public.
  • DATA_DIR: The directory containing the data CSV files. Default is data.
  • SYNTHETIC: Load synthetic data (boolean). Default is false
  • SYNTHETIC_NUMBER: Size of synthetic data, 100 or 1000. Default is 100.
  • DELIMITER: The delimiter used to separate data. Default is tab, can also be ,

Usage

Docker

docker run -v ./data:/data ghcr.io/health-informatics-uon/omop-lite

# docker-compose.yml
services:
  omop-lite:
    image: ghcr.io/health-informatics-uon/omop-lite
    volumes:
      - ./data:/data
    depends_on:
      - db

  db:
    image: postgres:latest
    environment:
      - POSTGRES_DB=omop
      - POSTGRES_PASSWORD=password
    ports:
      - "5432:5432"

Helm

To install using Helm:

# Add the Helm repository
helm repo add omop-lite https://health-informatics-uon.github.io/omop-lite
helm repo update

# Install the chart
helm install omop-lite omop-lite/omop-lite

The Helm chart deploys OMOP Lite as a Kubernetes Job that creates an OMOP CDM in a database. You can customise the installation using a values file:

# values.yaml
env:
  dbHost: postgres
  dbPort: "5432"
  dbUser: postgres
  dbPassword: postgres
  dbName: omop_helm
  dialect: postgresql
  schemaName: public
  synthetic: "false" 

# Data mounting configuration
data:
  persistentVolumeClaim:
    enabled: true
    create: true
    size: 10Gi
    storageClass: standard
    accessModes:
      - ReadOnlyMany
  
  # Optional: Prepare data from a local directory
  prepare:
    enabled: true
    sourcePath: "/path/to/your/data"  # Path on the node where data is stored

Install with custom values:

helm install omop-lite omop-lite/omop-lite -f values.yaml

CLI

uv run omop-lite --help

Using Your Own Data

To use your own data with the Helm chart:

  1. Option 1: Use the built-in data preparation

    • Set data.prepare.enabled: true
    • Set data.prepare.sourcePath to the path on your node where the data is stored
    • The chart will automatically copy your data to the PVC before running the OMOP Lite job
  2. Option 2: Manual data preparation

    • Create a PVC (either through the chart or manually)
    • Copy your data to the PVC using kubectl or another method
    • Set data.persistentVolumeClaim.enabled: true and provide the PVC name

Synthetic Data

If you need synthetic data, some is provided in the synthetic directory. It provides a small amount of data to load quickly. To load the synthetic data, run the container with the SYNTHETIC environment variable set to true.

Bring Your Own Data

You can provide your own data for loading into the tables by placing your files in the data/ directory. This should contain .csv files matching the data tables (DRUG_STRENGTH.csv, CONCEPT.csv, etc.).

To match the vocabulary files from Athena, this data should be tab-separated, but as a .csv file extension. You can override the delimiter with DELIMITER configuration.

Setup Script

The setup.sh script included in the Docker image will:

  1. Create the schema if it does not already exist.
  2. Execute the SQL files to set up the database schema, constraints, and indexes.
  3. Load data from the .csv files located in the DATA_DIR.

Text search OMOP

Full-text search

Adding a tsvector column to the concept table and an index on that column makes full-text search queries on the concept table run much faster. This can be configured by setting FTS_CREATE to be non-empty in the environment.

Vector search

Postgres does vector search too! To enable this on omop-lite, you can compose the compose-omop-ts.yml with

docker compose -f compose-omop-ts.yml

To do this, you need to have embeddings/embeddings.parquet, containing concept_ids and embeddings. This uses pgvector to create an embeddings table.

omop-lite testing

If you're a developer and want to iterate on omop-lite quickly, there's a small subset of the vocabularies sufficient to build in synthetic/. If you wish to test the vector search, there are matching embeddings in embeddings/embeddings.parquet.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omop_lite-0.1.1.tar.gz (4.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omop_lite-0.1.1-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file omop_lite-0.1.1.tar.gz.

File metadata

  • Download URL: omop_lite-0.1.1.tar.gz
  • Upload date:
  • Size: 4.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for omop_lite-0.1.1.tar.gz
Algorithm Hash digest
SHA256 05d424bd622f8f1eee5877805e79de546dda698036d023f48e11f8a21b4a225f
MD5 ad13da4f345bdce3ae2a1b4d629f3a29
BLAKE2b-256 8f8c93c97543b04c61457e4f5463330f7402f3e00c897d5a9908d4f9fb559b85

See more details on using hashes here.

Provenance

The following attestation bundles were made for omop_lite-0.1.1.tar.gz:

Publisher: publish.pypi.yml on Health-Informatics-UoN/omop-lite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omop_lite-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: omop_lite-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for omop_lite-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 17b1c1e8e759dce8aa43ca257b34e1b39bd9f85a28476a708068d3c19d8d4a79
MD5 751a9c38cbea4dc47b5c786f9cf4935a
BLAKE2b-256 4b233ab7e4ce278565227b05a026070aa3f3fb451a697b5c2d6b96e90f296434

See more details on using hashes here.

Provenance

The following attestation bundles were made for omop_lite-0.1.1-py3-none-any.whl:

Publisher: publish.pypi.yml on Health-Informatics-UoN/omop-lite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page