Skip to main content

Embedding extension for OMOP CDM. Utilises sqlite-vec by default and provides an optional pgvector backend, and optionall FAISS export.

Project description

omop-emb

Vector embedding layer for OMOP CDM concepts.

omop-emb generates, stores, and retrieves embeddings for OMOP concepts. It works out of the box with sqlite-vec (no external database required) and scales to PostgreSQL/pgvector for larger deployments. The database is the source of truth — FAISS is an optional read-acceleration sidecar, not a primary store.

Installation

pip install omop-emb                         # sqlite-vec backend (default, no extras needed)
pip install "omop-emb[pgvector]"             # adds PostgreSQL/pgvector support
pip install "omop-emb[faiss-cpu]"            # adds FAISS sidecar support
pip install "omop-emb[pgvector,faiss-cpu]"   # everything

Quick start

Ingest concepts (sqlite-vec, no external service):

export OMOP_EMB_BACKEND=sqlitevec
export OMOP_EMB_SQLITE_PATH=/data/omop_emb.db
export OMOP_CDM_DB_URL=postgresql+psycopg://user:pass@host:5432/omop_cdm

omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
    --model nomic-embed-text:v1.5

Search:

omop-emb embeddings search --api-base http://localhost:11434/v1 --api-key ollama \
    --model nomic-embed-text:v1.5 \
    --query "hypertension" --query "type 2 diabetes" \
    --standard-only --domain Condition --k 5

pgvector with HNSW index:

export OMOP_EMB_BACKEND=pgvector
export OMOP_EMB_DB_HOST=localhost
export OMOP_EMB_DB_USER=omop_emb
export OMOP_EMB_DB_PASSWORD=omop_emb
export OMOP_EMB_DB_NAME=omop_emb

omop-emb embeddings add-embeddings --api-base http://localhost:11434/v1 --api-key ollama \
    --model nomic-embed-text:v1.5
omop-emb maintenance rebuild-index --model nomic-embed-text:v1.5 --index-type hnsw --metric-type cosine

Environment variables

Variable Default Description
OMOP_EMB_BACKEND sqlitevec Backend: sqlitevec or pgvector.
OMOP_EMB_SQLITE_PATH sqlite-vec database file path (or :memory:).
OMOP_EMB_DB_HOST pgvector: PostgreSQL host.
OMOP_EMB_DB_PORT 5432 pgvector: PostgreSQL port.
OMOP_EMB_DB_USER pgvector: database user.
OMOP_EMB_DB_PASSWORD pgvector: database password.
OMOP_EMB_DB_NAME pgvector: database name.
OMOP_EMB_DB_URL pgvector: full SQLAlchemy URL (overrides individual vars).
OMOP_CDM_DB_URL OMOP CDM connection (required for ingestion commands only).
OMOP_EMB_FAISS_CACHE_DIR Default FAISS cache directory (alternative to --faiss-cache-dir).

See the Configuration Reference for the complete list including asymmetric embedding prefixes and driver overrides.

Documentation

Full documentation: https://AustralianCancerDataNetwork.github.io/omop-emb

Roadmap

  • sqlite-vec backend (default, zero-config)
  • pgvector backend (PostgreSQL)
  • HNSW index support for pgvector
  • FAISS sidecar (approximate nearest-neighbour read acceleration)
  • FAISS export / import CLI (export-faiss-cache, import-faiss-cache)
  • In-DB concept filtering (domain, vocabulary, standard status, active status)
  • Transparent FAISS fast path in EmbeddingReaderInterface
  • Extensive backend and registry testing
  • FAISS GPU support
  • pgvectorscale support
  • Vector quantisation for more efficient storage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omop_emb-1.0.1.tar.gz (191.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omop_emb-1.0.1-py3-none-any.whl (80.2 kB view details)

Uploaded Python 3

File details

Details for the file omop_emb-1.0.1.tar.gz.

File metadata

  • Download URL: omop_emb-1.0.1.tar.gz
  • Upload date:
  • Size: 191.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omop_emb-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b7f2d1c00a4cf07dfae205239318e37c43fef80969ba9504644d16bd0e08f552
MD5 f0ebe180c56b9b49f996358fe8baf716
BLAKE2b-256 448b0a8f87775e09ea2e69f7eee041c886b33255164321ee4b2fc578a10c91d8

See more details on using hashes here.

File details

Details for the file omop_emb-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: omop_emb-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 80.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omop_emb-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35b2c12eb1ebc016397d49523eeb3fe93fb1b20994c9c2df1ef9f11c0be3629d
MD5 41bd1ae7686c836726ce306cd375bed5
BLAKE2b-256 a0c70f52aef5ebae3764b33c4addce79082a836ce4eda08cf256fa4dc7cdd580

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page