Detect when your RAG vector index silently drifts from reality — pure Python, no external services

These details have not been verified by PyPI

Project links

Project description

embedding-drift-lib

Detect when your RAG vector index silently drifts from reality.

Pure Python library — no external services, no API keys, no database required. 3 dependencies: numpy, scikit-learn, scipy.

Install

pip install embedding-drift-lib

Quick Start

from embedding_drift_lib import DriftAnalyser, DriftConfig
import numpy as np

config   = DriftConfig(index_name="my-rag")
analyser = DriftAnalyser(config)

# Pass your index vectors and recent query vectors
report = analyser.analyse(reference_vectors, current_vectors)

print(report.drift_type)       # "none" | "mild" | "significant" | "severe"
print(report.mmd_score)        # 0.0 = identical, > 0.20 = severe drift
print(report.primary_cause)    # "model_drift" | "query_shift" | "data_staleness"
print(report.recommendation)   # human-readable action to take

Wrap Your Embedding Function (Zero-Touch Monitoring)

from embedding_drift_lib import DriftMonitor, DriftConfig
from openai import AsyncOpenAI

client  = AsyncOpenAI()
monitor = DriftMonitor(DriftConfig(index_name="my-rag"))

# Create baseline from your existing index (run once)
await monitor.create_snapshot(index_vectors, label="v1-baseline")

# Wrap your embed call — your RAG code does not change
embed = monitor.wrap(client.embeddings.create)

# Use normally — drift analysis runs in the background automatically
response = await embed(model="text-embedding-3-small", input=["user query"])

# Check anytime
report = monitor.last_report

What It Detects

Drift Type	Signal	Example
Model drift	High MMD + concentrated Wasserstein	OpenAI silently updates model weights
Query shift	High coverage gap + low MMD	Users ask about a new topic your corpus never covered
Data staleness	High MMD + diffuse Wasserstein	Corpus ages relative to current user queries

Configuration

config = DriftConfig(
    index_name                   = "my-rag",
    pca_components               = 50,      # reduce embedding dims before MMD
    n_permutations               = 200,     # permutation test iterations
    sample_rate                  = 0.05,    # sample 5% of queries (default)
    elevated_sample_rate         = 0.30,    # 30% when drift suspected
    analysis_batch_size          = 200,     # analyse every N queries
    mmd_mild_threshold           = 0.02,
    mmd_significant_threshold    = 0.08,
    mmd_severe_threshold         = 0.20,
    coverage_gap_alert_threshold = 0.15,    # alert when 15%+ queries uncovered
    snapshot_dir                 = "./snapshots",
)

How It Works

Your RAG App
    |
    | query -> embed() -> vector search -> LLM -> answer
    v
DriftMonitor wrapper (transparent)
    |
    +-- Samples 5% of embedding calls (adaptive: 30% when drift suspected)
    |
    +-- Every 200 samples, runs DriftAnalyser:
    |       1. PCA projection      1536-dim -> 50-dim
    |       2. MMD^2               primary drift metric
    |       3. Wasserstein         which semantic directions shifted
    |       4. Coverage gap        query shift detection
    |       5. Permutation test    200 shuffles, p < 0.05
    |
    +-- Classifies: none / mild / significant / severe
    +-- Diagnoses:  model_drift / query_shift / data_staleness
    +-- Returns DriftReport

The Math

MMD (Maximum Mean Discrepancy) — non-parametric test measuring distance between two distributions without assuming any shape. Uses RBF kernel with median heuristic bandwidth.

Wasserstein distance — computed per PCA dimension. Concentration ratio (max/mean) distinguishes sudden model swap (high concentration) from gradual data staleness (diffuse).

Coverage gap — fraction of queries whose best cosine similarity to any index document falls below threshold. Catches query shift that MMD misses.

Permutation test — 200 label shuffles build a null distribution. Observed MMD must exceed the 95th percentile of the null to be reported as statistically significant.

Author

Sarthak Pande — GitHub

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 31, 2026

This version

0.1.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_drift_lib-0.1.0.tar.gz (16.7 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

embedding_drift_lib-0.1.0-py3-none-any.whl (18.5 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file embedding_drift_lib-0.1.0.tar.gz.

File metadata

Download URL: embedding_drift_lib-0.1.0.tar.gz
Upload date: May 31, 2026
Size: 16.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for embedding_drift_lib-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b881ee50b8798055ab1f0e282ee73f7d09f4030653757a467380a2422f9300d3`
MD5	`0d43b2139396fa414539887c7fdb9f6b`
BLAKE2b-256	`861c0f3606e5dc43c41f9b65f7b9b2971c6723554df8af9a3a838a94449e3a1b`

See more details on using hashes here.

File details

Details for the file embedding_drift_lib-0.1.0-py3-none-any.whl.

File metadata

Download URL: embedding_drift_lib-0.1.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 18.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for embedding_drift_lib-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a12f73f166042286d2ee6fe0f37c6c639052d3aba7e6e4ba2c0bd195e7ea0ee`
MD5	`3daa3f791764371976d956cf33fb282b`
BLAKE2b-256	`10cfd7538c885c539ce4edb06a497f8dcf5e33a753a2f92f0e0b3882225c0c4f`

See more details on using hashes here.

embedding-drift-lib 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

embedding-drift-lib

Install

Quick Start

Wrap Your Embedding Function (Zero-Touch Monitoring)

What It Detects

Configuration

How It Works

The Math

Author

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes