Embedding pipeline ops + drift detection for production RAG: index manifests, version assertions, neighbor-stability eval, Drift-Adapter for in-place model migrations.
Project description
embspec
Embedding pipeline ops + drift detection for production RAG.
The single failure mode this library prevents: query encoder upgrade ships before the index is re-encoded; every health check stays 200 OK while retrieval accuracy silently collapses. The decompressed.io RAG observability post-mortem (2026-03-09) describes this exact bug — $15K of emergency re-encoding plus 2-5 days of engineer time before someone diagnosed it.
pip install embspec
from embspec import IndexManifest, embed_assert
manifest = IndexManifest.load("s3://my-rag/index-prod/manifest.json")
@embed_assert(manifest, model_id="amazon.titan-embed-text-v2:0", dimension=1024)
def search(query: str):
qv = embed_query(query)
return opensearch.knn_search(qv, ...)
If the query encoder ever drifts off the manifest, search raises EmbeddingVersionMismatch instead of silently returning bad results.
What's in v0.1
| Primitive | What it does | Anchored in |
|---|---|---|
IndexManifest + EmbeddingSpec |
Single-file source of truth for "what does this index contain" — model id, dimension, version, normalization | decompressed.io post-mortem |
assert_compatible() / embed_assert() decorator |
Fast-fail on encoder/index drift; raise or warn modes | same |
DriftAdapter |
Linear adapter from new-model embeddings to old-model space; lets you swap the query encoder without re-encoding the corpus | Drift-Adapter, Vejendla 2025 (arxiv:2509.23471) |
neighbor_stability() |
Compare two retrievers on a frozen probe set; reports overlap, Jaccard, regression list, deploy-safety verdict | RAGOps survey, Xu et al. 2025 (arxiv:2506.03401) |
Why not Evidently / Phoenix / WhyLogs?
- Evidently — tabular-ML drift heritage; LLM additions are recent and platform-shaped. Not a drop-in primitive.
- Phoenix — embedding-drift visualization is a sub-feature of a full observability platform. You adopt the platform.
- WhyLogs — generic data-logging primitive; not embedding-aware; last commit 2025-01.
- embspec — three small primitives (
IndexManifest,DriftAdapter,neighbor_stability) you compose with whatever vector DB and tracer you already have. No platform, no UI, no agent framework.
Usage
Manifest + version assertion
from datetime import datetime, timezone
from embspec import IndexManifest, EmbeddingSpec
# When you build the index, write a manifest alongside it
manifest = IndexManifest(
index_name="prod-v3",
embedding=EmbeddingSpec(
model_id="amazon.titan-embed-text-v2:0",
dimension=1024,
normalization="l2",
),
created_at=datetime.now(timezone.utc),
doc_count=8_000_000,
)
manifest.save("s3://my-rag/index-prod/manifest.json")
# (or any local path; manifest.save uses pathlib.Path.write_text)
# At query time, assert the encoder matches before searching
from embspec import embed_assert
@embed_assert(
"s3://my-rag/index-prod/manifest.json", # path or IndexManifest
model_id="amazon.titan-embed-text-v2:0",
dimension=1024,
mode="raise", # or "log" for canary rollout
)
def search(query: str) -> list[dict]:
qv = embed_query(query)
return opensearch.knn_search(qv, ...)
Drift-Adapter for in-place model migration
When you want to upgrade the query encoder without re-encoding 8M docs:
from embspec import DriftAdapter
import numpy as np
# Sample, e.g., 50K docs and embed them with both old and new models
old_emb = embed_with_old_model(sample_texts) # shape (50000, 1024)
new_emb = embed_with_new_model(sample_texts) # shape (50000, 1536)
adapter = DriftAdapter.fit(
new_embeddings=new_emb,
old_embeddings=old_emb,
regularization=0.01, # ridge; helps when new_emb is rank-deficient
)
adapter.save("s3://my-rag/adapters/v3-to-v4.npz")
# At query time, embed with the new model then transform into old space
qv_new = embed_with_new_model(query)
qv_compatible = adapter.transform(qv_new) # shape (1024,)
results = opensearch.knn_search(qv_compatible, ...)
Per Vejendla 2025, this typically recovers 95-99% retrieval at ~1% of the cost of re-encoding the full corpus.
Neighbor stability for safe migrations
from embspec import neighbor_stability
# Run a fixed probe set against both indexes
old_results = {pid: retrieve_from_v3(q) for pid, q in probes.items()}
new_results = {pid: retrieve_from_v4(q) for pid, q in probes.items()}
report = neighbor_stability(old_results, new_results, k=10)
print(f"mean overlap@10: {report.mean_overlap_at_k:.1%}")
print(f"regressions: {report.regression_count}/{report.n_probes}")
if report.is_safe_to_deploy(min_mean_overlap=0.85, max_regression_fraction=0.05):
deploy_v4()
else:
investigate(report.regression_probe_ids)
What it explicitly does NOT do
- Not a vector database.
- Not a RAG framework. No retriever, no chunker, no generator.
- Not a generic ML drift library. Embedding-and-retrieval-shaped only.
- Not an eval framework.
neighbor_stabilityis the one judgment you can make without a labeled gold set; for richer evals useragas,trulens, or a tracer. - Does not host or serve embeddings.
Roadmap
- v0.2:
dual_write()context manager for blue/green index migrations across OpenSearch / pgvector / Pinecone / Qdrant. - v0.3:
ChunkingExperimentA/B harness with optional LLM-judge. - v0.4: integration helpers for AWS Bedrock embedding models, OpenAI, Cohere, Voyage.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embspec-0.1.0.tar.gz.
File metadata
- Download URL: embspec-0.1.0.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f03d8f05d9cc4e7db508380aa2453cea034a2f6d5970dfccabc343c09fac5ea5
|
|
| MD5 |
4f76f598d4595bb0f9f1815de330eb24
|
|
| BLAKE2b-256 |
f9169d42ee7594de28bf4817e1d6b81febd79fbe2d8e3df8abf51282f479a7d4
|
File details
Details for the file embspec-0.1.0-py3-none-any.whl.
File metadata
- Download URL: embspec-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
163b7100c40a971bb989f1675d5332b610e8debd08572316a2a4a1e0ee52cebf
|
|
| MD5 |
3ccc1ba9c19ffa8fc26a87bf0cdf62d5
|
|
| BLAKE2b-256 |
fba9b8f4111fa99a68c37bebf07cc5d337595ee58d9d7c9346194156af53b9dc
|