Skip to main content

Embedding extension for OMOP CDM. Utilises sqlite-vec by default and provides an optional pgvector backend, and optionall FAISS export.

Project description

omop-emb

Vector embedding layer for OMOP CDM concepts.

omop-emb generates, stores, and retrieves embeddings for OMOP concepts. It works out of the box with sqlite-vec (no external database required) and scales to PostgreSQL/pgvector for larger deployments. The database is the source of truth — FAISS is an optional read-acceleration sidecar, not a primary store.

Installation

pip install omop-emb                         # sqlite-vec backend (default, no extras needed)
pip install "omop-emb[pgvector]"             # adds PostgreSQL/pgvector support
pip install "omop-emb[faiss-cpu]"            # adds FAISS sidecar support
pip install "omop-emb[pgvector,faiss-cpu]"   # everything

Quick start

Ingest concepts (sqlite-vec, no external service):

export OMOP_EMB_BACKEND=sqlitevec
export OMOP_EMB_SQLITE_PATH=/data/omop_emb.db
export OMOP_CDM_DB_URL=postgresql+psycopg://user:pass@host:5432/omop_cdm

omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
    --model nomic-embed-text:v1.5

Search:

omop-emb embeddings search --api-base http://localhost:11434/v1 --api-key ollama \
    --model nomic-embed-text:v1.5 \
    --query "hypertension" --query "type 2 diabetes" \
    --standard-only --domain Condition --k 5

pgvector with HNSW index:

export OMOP_EMB_BACKEND=pgvector
export OMOP_EMB_DB_HOST=localhost
export OMOP_EMB_DB_USER=omop_emb
export OMOP_EMB_DB_PASSWORD=omop_emb
export OMOP_EMB_DB_NAME=omop_emb

omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
    --model nomic-embed-text:v1.5
omop-emb maintenance rebuild-index --model nomic-embed-text:v1.5 --index-type hnsw --metric-type cosine

Environment variables

Variable Default Description
OMOP_EMB_BACKEND sqlitevec Backend: sqlitevec or pgvector.
OMOP_EMB_SQLITE_PATH sqlite-vec database file path (or :memory:).
OMOP_EMB_DB_HOST pgvector: PostgreSQL host.
OMOP_EMB_DB_PORT 5432 pgvector: PostgreSQL port.
OMOP_EMB_DB_USER pgvector: database user.
OMOP_EMB_DB_PASSWORD pgvector: database password.
OMOP_EMB_DB_NAME pgvector: database name.
OMOP_EMB_DB_URL pgvector: full SQLAlchemy URL (overrides individual vars).
OMOP_CDM_DB_URL OMOP CDM connection (required for ingestion commands only).
OMOP_EMB_FAISS_CACHE_DIR Default FAISS cache directory (alternative to --faiss-cache-dir).

See the Configuration Reference for the complete list including asymmetric embedding prefixes and driver overrides.

Documentation

Full documentation: https://AustralianCancerDataNetwork.github.io/omop-emb

Roadmap

  • sqlite-vec backend (default, zero-config)
  • pgvector backend (PostgreSQL)
  • HNSW index support for pgvector
  • FAISS sidecar (approximate nearest-neighbour read acceleration)
  • FAISS export / import CLI (export-faiss-cache, import-faiss-cache)
  • In-DB concept filtering (domain, vocabulary, standard status, active status)
  • Transparent FAISS fast path in EmbeddingReaderInterface
  • Extensive backend and registry testing
  • FAISS GPU support
  • pgvectorscale support
  • Vector quantisation for more efficient storage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omop_emb-1.0.2.tar.gz (192.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omop_emb-1.0.2-py3-none-any.whl (81.8 kB view details)

Uploaded Python 3

File details

Details for the file omop_emb-1.0.2.tar.gz.

File metadata

  • Download URL: omop_emb-1.0.2.tar.gz
  • Upload date:
  • Size: 192.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omop_emb-1.0.2.tar.gz
Algorithm Hash digest
SHA256 82eeaf1e2924479b5ebfb74d12c593b18cb16f16ad0785b6557205bd7e49cde0
MD5 66462e5bf938af77587299ede62781e3
BLAKE2b-256 4fb645261ef4e79b84a033ee963d80ea5a1fc07f76766c23367049e0827ff926

See more details on using hashes here.

File details

Details for the file omop_emb-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: omop_emb-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 81.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omop_emb-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1b3535d62ea6e90950da495fd9a2d3117a73fd2fa96ff4612a0b2e637e2152ad
MD5 519556aad2d7d55f07abe1d8b1f368ea
BLAKE2b-256 9d1d603cbdbd90c36c84ea4d91527f372975827004f63964ab076edbc05d3e6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page