Embedding extension for OMOP CDM. Utilises sqlite-vec by default and provides an optional pgvector backend, and optionall FAISS export.
Project description
omop-emb
Vector embedding layer for OMOP CDM concepts.
omop-emb generates, stores, and retrieves embeddings for OMOP concepts. It
works out of the box with sqlite-vec (no external database required) and
scales to PostgreSQL/pgvector for larger deployments. The database is the
source of truth — FAISS is an optional read-acceleration sidecar, not a primary
store.
Installation
pip install omop-emb # sqlite-vec backend (default, no extras needed)
pip install "omop-emb[pgvector]" # adds PostgreSQL/pgvector support
pip install "omop-emb[faiss-cpu]" # adds FAISS sidecar support
pip install "omop-emb[pgvector,faiss-cpu]" # everything
Quick start
Ingest concepts (sqlite-vec, no external service):
export OMOP_EMB_BACKEND=sqlitevec
export OMOP_EMB_SQLITE_PATH=/data/omop_emb.db
export OMOP_CDM_DB_URL=postgresql+psycopg://user:pass@host:5432/omop_cdm
omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
--provider ollama --model nomic-embed-text:v1.5
Search:
omop-emb embeddings search --api-base http://localhost:11434/v1 --api-key ollama \
--provider ollama --model nomic-embed-text:v1.5 \
--query "hypertension" --query "type 2 diabetes" \
--standard-only --domain Condition --k 5
pgvector with HNSW index:
export OMOP_EMB_BACKEND=pgvector
export OMOP_EMB_DB_HOST=localhost
export OMOP_EMB_DB_USER=omop_emb
export OMOP_EMB_DB_PASSWORD=omop_emb
export OMOP_EMB_DB_NAME=omop_emb
omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
--provider ollama --model nomic-embed-text:v1.5
omop-emb maintenance rebuild-index --model nomic-embed-text:v1.5 --index-type hnsw --metric-type cosine
Environment variables
| Variable | Default | Description |
|---|---|---|
OMOP_EMB_BACKEND |
sqlitevec |
Backend: sqlitevec or pgvector. |
OMOP_EMB_SQLITE_PATH |
— | sqlite-vec database file path (or :memory:). |
OMOP_EMB_DB_HOST |
— | pgvector: PostgreSQL host. |
OMOP_EMB_DB_PORT |
5432 |
pgvector: PostgreSQL port. |
OMOP_EMB_DB_USER |
— | pgvector: database user. |
OMOP_EMB_DB_PASSWORD |
— | pgvector: database password. |
OMOP_EMB_DB_NAME |
— | pgvector: database name. |
OMOP_EMB_DB_URL |
— | pgvector: full SQLAlchemy URL (overrides individual vars). |
OMOP_CDM_DB_URL |
— | OMOP CDM connection (required for ingestion commands only). |
OMOP_EMB_FAISS_CACHE_DIR |
— | Default FAISS cache directory (alternative to --faiss-cache-dir). |
See the Configuration Reference for the complete list including asymmetric embedding prefixes and driver overrides.
Documentation
Full documentation: https://AustralianCancerDataNetwork.github.io/omop-emb
- Installation & backend setup
- Configuration reference
- Backend selection & index types
- CLI reference
- Interface guide
Roadmap
- sqlite-vec backend (default, zero-config)
- pgvector backend (PostgreSQL)
- HNSW index support for pgvector
- FAISS sidecar (approximate nearest-neighbour read acceleration)
- Embedding bundle export / import CLI (
maintenance export,maintenance import,maintenance build-faiss-cache) - In-DB concept filtering (domain, vocabulary, standard status, active status)
- Transparent FAISS fast path in
EmbeddingReaderInterface - Extensive backend and registry testing
- FAISS GPU support
-
pgvectorscalesupport - Vector quantisation for more efficient storage
Configuration via oa-configurator
The database connection can also be configured via
oa-configurator,
which stores settings in ~/.config/omop/config.toml and eliminates the need
for environment variables at runtime:
omop-config init
omop-config configure omop_alchemy # CDM database (required for ingestion)
omop-config configure omop_emb # embedding database
omop-config configure omop_emb is required for local-dev setup before running
the pgvector-backed test suite (CI provisions this automatically). Without it,
those tests skip with "Resource 'test_emb_db' not configured" rather than
failing.
See oa-configurator Setup for details.
Docker Compose
The included docker-compose.yaml provides both a CDM PostgreSQL database and a
pgvector embedding database, plus a Python container with all optional backends
pre-installed ([pgvector,faiss-cpu]). Default credentials work out of the box:
docker compose up
Include Ollama by adding the standalone profile:
docker compose --profile standalone up
The python-emb service runs omop-config configure at startup. To override
credentials:
cp .env.example .env
docker compose up
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omop_emb-1.1.1.tar.gz.
File metadata
- Download URL: omop_emb-1.1.1.tar.gz
- Upload date:
- Size: 212.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d12dfe82ae64c359cc98aee0d09ccdfd29b559755bedc4ceb08c8666c6058cf6
|
|
| MD5 |
1d4e7e5a26ee660f7805e0a736558e69
|
|
| BLAKE2b-256 |
7344f2081ffd9e301c8dcd488b9af9fd278764ce0e51a29e7e8581f778cbf65c
|
File details
Details for the file omop_emb-1.1.1-py3-none-any.whl.
File metadata
- Download URL: omop_emb-1.1.1-py3-none-any.whl
- Upload date:
- Size: 90.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34d1e28d9e90c9872b7e2def6b539c4787e5204c119cc0fade5d1dacba17427a
|
|
| MD5 |
52d635d3aba93973802c1c0a7717b9b0
|
|
| BLAKE2b-256 |
352986d15d940ef56ca948fb02c049c681e3f5c6bd9d43281bf44455cb57b244
|