Skip to main content

Embedding pipeline ops + drift detection for production RAG: index manifests, version assertions, neighbor-stability eval, Drift-Adapter for in-place model migrations.

Project description

embspec

Embedding pipeline ops + drift detection for production RAG.

The single failure mode this library prevents: query encoder upgrade ships before the index is re-encoded; every health check stays 200 OK while retrieval accuracy silently collapses. The decompressed.io RAG observability post-mortem (2026-03-09) describes this exact bug — $15K of emergency re-encoding plus 2-5 days of engineer time before someone diagnosed it.

pip install embspec
from embspec import IndexManifest, embed_assert

manifest = IndexManifest.load("s3://my-rag/index-prod/manifest.json")

@embed_assert(manifest, model_id="amazon.titan-embed-text-v2:0", dimension=1024)
def search(query: str):
    qv = embed_query(query)
    return opensearch.knn_search(qv, ...)

If the query encoder ever drifts off the manifest, search raises EmbeddingVersionMismatch instead of silently returning bad results.

What's in v0.1

Primitive What it does Anchored in
IndexManifest + EmbeddingSpec Single-file source of truth for "what does this index contain" — model id, dimension, version, normalization decompressed.io post-mortem
assert_compatible() / embed_assert() decorator Fast-fail on encoder/index drift; raise or warn modes same
DriftAdapter Linear adapter from new-model embeddings to old-model space; lets you swap the query encoder without re-encoding the corpus Drift-Adapter, Vejendla 2025 (arxiv:2509.23471)
neighbor_stability() Compare two retrievers on a frozen probe set; reports overlap, Jaccard, regression list, deploy-safety verdict RAGOps survey, Xu et al. 2025 (arxiv:2506.03401)

Why not Evidently / Phoenix / WhyLogs?

  • Evidently — tabular-ML drift heritage; LLM additions are recent and platform-shaped. Not a drop-in primitive.
  • Phoenix — embedding-drift visualization is a sub-feature of a full observability platform. You adopt the platform.
  • WhyLogs — generic data-logging primitive; not embedding-aware; last commit 2025-01.
  • embspec — three small primitives (IndexManifest, DriftAdapter, neighbor_stability) you compose with whatever vector DB and tracer you already have. No platform, no UI, no agent framework.

Usage

Manifest + version assertion

from datetime import datetime, timezone
from embspec import IndexManifest, EmbeddingSpec

# When you build the index, write a manifest alongside it
manifest = IndexManifest(
    index_name="prod-v3",
    embedding=EmbeddingSpec(
        model_id="amazon.titan-embed-text-v2:0",
        dimension=1024,
        normalization="l2",
    ),
    created_at=datetime.now(timezone.utc),
    doc_count=8_000_000,
)
manifest.save("s3://my-rag/index-prod/manifest.json")
# (or any local path; manifest.save uses pathlib.Path.write_text)

# At query time, assert the encoder matches before searching
from embspec import embed_assert

@embed_assert(
    "s3://my-rag/index-prod/manifest.json",  # path or IndexManifest
    model_id="amazon.titan-embed-text-v2:0",
    dimension=1024,
    mode="raise",  # or "log" for canary rollout
)
def search(query: str) -> list[dict]:
    qv = embed_query(query)
    return opensearch.knn_search(qv, ...)

Drift-Adapter for in-place model migration

When you want to upgrade the query encoder without re-encoding 8M docs:

from embspec import DriftAdapter
import numpy as np

# Sample, e.g., 50K docs and embed them with both old and new models
old_emb = embed_with_old_model(sample_texts)  # shape (50000, 1024)
new_emb = embed_with_new_model(sample_texts)  # shape (50000, 1536)

adapter = DriftAdapter.fit(
    new_embeddings=new_emb,
    old_embeddings=old_emb,
    regularization=0.01,  # ridge; helps when new_emb is rank-deficient
)
adapter.save("s3://my-rag/adapters/v3-to-v4.npz")

# At query time, embed with the new model then transform into old space
qv_new = embed_with_new_model(query)
qv_compatible = adapter.transform(qv_new)  # shape (1024,)
results = opensearch.knn_search(qv_compatible, ...)

Per Vejendla 2025, this typically recovers 95-99% retrieval at ~1% of the cost of re-encoding the full corpus.

Neighbor stability for safe migrations

from embspec import neighbor_stability

# Run a fixed probe set against both indexes
old_results = {pid: retrieve_from_v3(q) for pid, q in probes.items()}
new_results = {pid: retrieve_from_v4(q) for pid, q in probes.items()}

report = neighbor_stability(old_results, new_results, k=10)
print(f"mean overlap@10: {report.mean_overlap_at_k:.1%}")
print(f"regressions: {report.regression_count}/{report.n_probes}")

if report.is_safe_to_deploy(min_mean_overlap=0.85, max_regression_fraction=0.05):
    deploy_v4()
else:
    investigate(report.regression_probe_ids)

What it explicitly does NOT do

  • Not a vector database.
  • Not a RAG framework. No retriever, no chunker, no generator.
  • Not a generic ML drift library. Embedding-and-retrieval-shaped only.
  • Not an eval framework. neighbor_stability is the one judgment you can make without a labeled gold set; for richer evals use ragas, trulens, or a tracer.
  • Does not host or serve embeddings.

Roadmap

  • v0.2: dual_write() context manager for blue/green index migrations across OpenSearch / pgvector / Pinecone / Qdrant.
  • v0.3: ChunkingExperiment A/B harness with optional LLM-judge.
  • v0.4: integration helpers for AWS Bedrock embedding models, OpenAI, Cohere, Voyage.

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embspec-0.1.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embspec-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file embspec-0.1.0.tar.gz.

File metadata

  • Download URL: embspec-0.1.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for embspec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f03d8f05d9cc4e7db508380aa2453cea034a2f6d5970dfccabc343c09fac5ea5
MD5 4f76f598d4595bb0f9f1815de330eb24
BLAKE2b-256 f9169d42ee7594de28bf4817e1d6b81febd79fbe2d8e3df8abf51282f479a7d4

See more details on using hashes here.

File details

Details for the file embspec-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: embspec-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for embspec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 163b7100c40a971bb989f1675d5332b610e8debd08572316a2a4a1e0ee52cebf
MD5 3ccc1ba9c19ffa8fc26a87bf0cdf62d5
BLAKE2b-256 fba9b8f4111fa99a68c37bebf07cc5d337595ee58d9d7c9346194156af53b9dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page