Embedding extension for OMOP CDM. Utilises sqlite-vec by default and provides an optional pgvector backend, and optionall FAISS export.
Project description
omop-emb
Vector embedding layer for OMOP CDM concepts.
omop-emb generates, stores, and retrieves embeddings for OMOP concepts. It
works out of the box with sqlite-vec (no external database required) and
scales to PostgreSQL/pgvector for larger deployments. The database is the
source of truth — FAISS is an optional read-acceleration sidecar, not a primary
store.
Installation
pip install omop-emb # sqlite-vec backend (default, no extras needed)
pip install "omop-emb[pgvector]" # adds PostgreSQL/pgvector support
pip install "omop-emb[faiss-cpu]" # adds FAISS sidecar support
pip install "omop-emb[pgvector,faiss-cpu]" # everything
Quick start
Ingest concepts (sqlite-vec, no external service):
export OMOP_EMB_BACKEND=sqlitevec
export OMOP_EMB_SQLITE_PATH=/data/omop_emb.db
export OMOP_CDM_DB_URL=postgresql+psycopg://user:pass@host:5432/omop_cdm
omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
--model nomic-embed-text:v1.5
Search:
omop-emb embeddings search --api-base http://localhost:11434/v1 --api-key ollama \
--model nomic-embed-text:v1.5 \
--query "hypertension" --query "type 2 diabetes" \
--standard-only --domain Condition --k 5
pgvector with HNSW index:
export OMOP_EMB_BACKEND=pgvector
export OMOP_EMB_DB_HOST=localhost
export OMOP_EMB_DB_USER=omop_emb
export OMOP_EMB_DB_PASSWORD=omop_emb
export OMOP_EMB_DB_NAME=omop_emb
omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
--model nomic-embed-text:v1.5
omop-emb maintenance rebuild-index --model nomic-embed-text:v1.5 --index-type hnsw --metric-type cosine
Environment variables
| Variable | Default | Description |
|---|---|---|
OMOP_EMB_BACKEND |
sqlitevec |
Backend: sqlitevec or pgvector. |
OMOP_EMB_SQLITE_PATH |
— | sqlite-vec database file path (or :memory:). |
OMOP_EMB_DB_HOST |
— | pgvector: PostgreSQL host. |
OMOP_EMB_DB_PORT |
5432 |
pgvector: PostgreSQL port. |
OMOP_EMB_DB_USER |
— | pgvector: database user. |
OMOP_EMB_DB_PASSWORD |
— | pgvector: database password. |
OMOP_EMB_DB_NAME |
— | pgvector: database name. |
OMOP_EMB_DB_URL |
— | pgvector: full SQLAlchemy URL (overrides individual vars). |
OMOP_CDM_DB_URL |
— | OMOP CDM connection (required for ingestion commands only). |
OMOP_EMB_FAISS_CACHE_DIR |
— | Default FAISS cache directory (alternative to --faiss-cache-dir). |
See the Configuration Reference for the complete list including asymmetric embedding prefixes and driver overrides.
Documentation
Full documentation: https://AustralianCancerDataNetwork.github.io/omop-emb
- Installation & backend setup
- Configuration reference
- Backend selection & index types
- CLI reference
- Interface guide
Roadmap
- sqlite-vec backend (default, zero-config)
- pgvector backend (PostgreSQL)
- HNSW index support for pgvector
- FAISS sidecar (approximate nearest-neighbour read acceleration)
- FAISS export / import CLI (
export-faiss-cache,import-faiss-cache) - In-DB concept filtering (domain, vocabulary, standard status, active status)
- Transparent FAISS fast path in
EmbeddingReaderInterface - Extensive backend and registry testing
- FAISS GPU support
-
pgvectorscalesupport - Vector quantisation for more efficient storage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omop_emb-1.0.2.tar.gz.
File metadata
- Download URL: omop_emb-1.0.2.tar.gz
- Upload date:
- Size: 192.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82eeaf1e2924479b5ebfb74d12c593b18cb16f16ad0785b6557205bd7e49cde0
|
|
| MD5 |
66462e5bf938af77587299ede62781e3
|
|
| BLAKE2b-256 |
4fb645261ef4e79b84a033ee963d80ea5a1fc07f76766c23367049e0827ff926
|
File details
Details for the file omop_emb-1.0.2-py3-none-any.whl.
File metadata
- Download URL: omop_emb-1.0.2-py3-none-any.whl
- Upload date:
- Size: 81.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b3535d62ea6e90950da495fd9a2d3117a73fd2fa96ff4612a0b2e637e2152ad
|
|
| MD5 |
519556aad2d7d55f07abe1d8b1f368ea
|
|
| BLAKE2b-256 |
9d1d603cbdbd90c36c84ea4d91527f372975827004f63964ab076edbc05d3e6d
|