Get an OMOP CDM database running quickly.
Project description
omop-lite
A small container to get an OMOP CDM database running quickly, with support for both PostgreSQL and SQL Server.
Drop your data into data/, and run the container.
Environment Variables
You can configure the Docker container using the following environment variables:
DB_HOST: The hostname of the database. Default isdb.DB_PORT: The port number of the database. Default is5432.DB_USER: The username for the database. Default ispostgres.DB_PASSWORD: The password for the database. Default ispassword.DB_NAME: The name of the database. Default isomop.DIALECT: The type of database to use. Default ispostgresql, but can also bemssql.SCHEMA_NAME: The name of the schema to be created/used in the database. Default ispublic.DATA_DIR: The directory containing the data CSV files. Default isdata.SYNTHETIC: Load synthetic data (boolean). Default isfalseSYNTHETIC_NUMBER: Size of synthetic data,100or1000. Default is100.DELIMITER: The delimiter used to separate data. Default istab, can also be,
Usage
Docker
docker run -v ./data:/data ghcr.io/health-informatics-uon/omop-lite
# docker-compose.yml
services:
omop-lite:
image: ghcr.io/health-informatics-uon/omop-lite
volumes:
- ./data:/data
depends_on:
- db
db:
image: postgres:latest
environment:
- POSTGRES_DB=omop
- POSTGRES_PASSWORD=password
ports:
- "5432:5432"
Helm
To install using Helm:
# Add the Helm repository
helm repo add omop-lite https://health-informatics-uon.github.io/omop-lite
helm repo update
# Install the chart
helm install omop-lite omop-lite/omop-lite
The Helm chart deploys OMOP Lite as a Kubernetes Job that creates an OMOP CDM in a database. You can customise the installation using a values file:
# values.yaml
env:
dbHost: postgres
dbPort: "5432"
dbUser: postgres
dbPassword: postgres
dbName: omop_helm
dialect: postgresql
schemaName: public
synthetic: "false"
# Data mounting configuration
data:
persistentVolumeClaim:
enabled: true
create: true
size: 10Gi
storageClass: standard
accessModes:
- ReadOnlyMany
# Optional: Prepare data from a local directory
prepare:
enabled: true
sourcePath: "/path/to/your/data" # Path on the node where data is stored
Install with custom values:
helm install omop-lite omop-lite/omop-lite -f values.yaml
Using Your Own Data
To use your own data with the Helm chart:
-
Option 1: Use the built-in data preparation
- Set
data.prepare.enabled: true - Set
data.prepare.sourcePathto the path on your node where the data is stored - The chart will automatically copy your data to the PVC before running the OMOP Lite job
- Set
-
Option 2: Manual data preparation
- Create a PVC (either through the chart or manually)
- Copy your data to the PVC using kubectl or another method
- Set
data.persistentVolumeClaim.enabled: trueand provide the PVC name
Synthetic Data
If you need synthetic data, some is provided in the synthetic directory. It provides a small amount of data to load quickly.
To load the synthetic data, run the container with the SYNTHETIC environment variable set to true.
- 100 is fake data.
- 1000 is Synthea 1k data.
Bring Your Own Data
You can provide your own data for loading into the tables by placing your files in the data/ directory. This should contain .csv files matching the data tables (DRUG_STRENGTH.csv, CONCEPT.csv, etc.).
To match the vocabulary files from Athena, this data should be tab-separated, but as a .csv file extension.
You can override the delimiter with DELIMITER configuration.
Setup Script
The setup.sh script included in the Docker image will:
- Create the schema if it does not already exist.
- Execute the SQL files to set up the database schema, constraints, and indexes.
- Load data from the
.csvfiles located in theDATA_DIR.
Text search OMOP
Full-text search
Adding a tsvector column to the concept table and an index on that column makes full-text search queries on the concept table run much faster.
This can be configured by setting FTS_CREATE to be non-empty in the environment.
Vector search
Postgres does vector search too!
To enable this on omop-lite, you can compose the compose-omop-ts.yml with
docker compose -f compose-omop-ts.yml
To do this, you need to have embeddings/embeddings.parquet, containing concept_ids and embeddings.
This uses pgvector to create an embeddings table.
omop-lite testing
If you're a developer and want to iterate on omop-lite quickly, there's a small subset of the vocabularies sufficient to build in synthetic/.
If you wish to test the vector search, there are matching embeddings in embeddings/embeddings.parquet.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omop_lite-0.0.15.tar.gz.
File metadata
- Download URL: omop_lite-0.0.15.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ee29293476b8407f8f6457671006df93416d89bb11b8a604627a81f72248d4a
|
|
| MD5 |
d6ebe03bfa0c40e8accb5ad6df8ec344
|
|
| BLAKE2b-256 |
a75364a1bcaa066a2d247840aadccc377e61e9848f4005a1d63c600f86f197fb
|
Provenance
The following attestation bundles were made for omop_lite-0.0.15.tar.gz:
Publisher:
publish.pypi.yml on Health-Informatics-UoN/omop-lite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omop_lite-0.0.15.tar.gz -
Subject digest:
2ee29293476b8407f8f6457671006df93416d89bb11b8a604627a81f72248d4a - Sigstore transparency entry: 243952177
- Sigstore integration time:
-
Permalink:
Health-Informatics-UoN/omop-lite@d8e0882303ce7c215758565e0fad7c0e3f607f9c -
Branch / Tag:
refs/tags/v0.0.15 - Owner: https://github.com/Health-Informatics-UoN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.pypi.yml@d8e0882303ce7c215758565e0fad7c0e3f607f9c -
Trigger Event:
release
-
Statement type:
File details
Details for the file omop_lite-0.0.15-py3-none-any.whl.
File metadata
- Download URL: omop_lite-0.0.15-py3-none-any.whl
- Upload date:
- Size: 4.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1b8421d945c59c1e80de049b37dd994613f4a615bf2efe6dab2d60ee70d50ca
|
|
| MD5 |
77c29fe832037b24d8afaffd7b1193d0
|
|
| BLAKE2b-256 |
ef7432c522885b09a636253344cfcb12646034e9a2e702d6c3a34906f9f011fc
|
Provenance
The following attestation bundles were made for omop_lite-0.0.15-py3-none-any.whl:
Publisher:
publish.pypi.yml on Health-Informatics-UoN/omop-lite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omop_lite-0.0.15-py3-none-any.whl -
Subject digest:
e1b8421d945c59c1e80de049b37dd994613f4a615bf2efe6dab2d60ee70d50ca - Sigstore transparency entry: 243952179
- Sigstore integration time:
-
Permalink:
Health-Informatics-UoN/omop-lite@d8e0882303ce7c215758565e0fad7c0e3f607f9c -
Branch / Tag:
refs/tags/v0.0.15 - Owner: https://github.com/Health-Informatics-UoN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.pypi.yml@d8e0882303ce7c215758565e0fad7c0e3f607f9c -
Trigger Event:
release
-
Statement type: