Skip to main content

5-dimensional drift detection for production RAG systems.

Project description

ragdrift

5-dimensional drift detection for production RAG systems. Rust core, Python frontend.

CI PyPI crates.io docs license

The problem

You have a RAG system in production. It worked when you launched. Then, quietly, it stopped working as well. By the time the support tickets arrive and you finally get an eval-set rerun on the calendar, the regression has been live for three weeks.

The reason is that retrieval quality drifts in five different places, and none of them produce a loud signal on their own. The embedding model gets re-trained and now maps the same input to a slightly different region. The corpus gets reindexed and the document distribution shifts. Users find your product and start asking different questions. The re-ranker score distribution drifts because of one of the above. Each of these can degrade answer quality without breaking a single test.

ragdrift watches all five and alerts on the one that moves.

Install

pip install ragdrift                    # core
pip install 'ragdrift[opensearch,aws]'  # OpenSearch adapter + CloudWatch exporter
pip install 'ragdrift[dev]'             # plus pytest, mypy, ruff, maturin

30-second quickstart

import numpy as np
import ragdrift

rng = np.random.default_rng(0)

# Baseline: a frozen sample from when retrieval quality was known-good.
baseline = ragdrift.BaselineSnapshot(
    embeddings=rng.standard_normal((4096, 384)).astype(np.float32),
    confidence_scores=rng.uniform(0.85, 0.99, size=4096).astype(np.float64),
)

# Current: today's window. In production, pull from your vector store.
current_emb = rng.standard_normal((4096, 384)).astype(np.float32) + 1.5
current_conf = rng.uniform(0.55, 0.75, size=4096).astype(np.float64)

monitor = ragdrift.RagDriftMonitor(baseline)
report = monitor.check(embeddings=current_emb, confidence_scores=current_conf)

if report.any_exceeded():
    for s in report.scores:
        print(f"[{'DRIFT' if s.exceeded else 'ok':5s}] {s.dimension:11s} "
              f"score={s.score:.4f} method={s.method}")

Run python examples/quickstart.py for a synthetic end-to-end demo.

Why not X

  • Arize Phoenix / Phoenix Arize. Excellent eval and tracing. Less focused on multi-dimensional batch drift; their drift surface is embeddings-only and tied to the Phoenix collector.
  • Evidently. Great at tabular data drift, broad statistical coverage, no native treatment of embedding or query drift in the RAG sense.
  • WhyLabs / whylogs. Excellent profiling primitive. The whylog format is optimized for compaction; getting an embedding distribution test out of it is more work than calling MMD directly.
  • NannyML. Strong for supervised tabular drift with a holdout label. Not a fit for the unlabeled embedding/query side of RAG.

ragdrift is the smaller-scope option: five concrete dimensions, one batch function call per dimension, fast Rust core, no service to run.

Architecture

+----------------+     +------------------+     +-------------------+
| python facade  | --> | ragdrift._native | --> | ragdrift-core     |
| (typed)        |     | (PyO3 bindings)  |     | (Rust statistics) |
+----------------+     +------------------+     +-------------------+
       |                                                 ^
       v                                                 |
+----------------+                                       |
| adapters/      |  pull baseline + current  ------------+
|   opensearch   |  embeddings, features, etc.
|   pgvector
|   pinecone     |
+----------------+
       |
       v
+----------------+
| exporters/     |  publish DriftReport
|   cloudwatch   |
|   prometheus   |
|   datadog      |
+----------------+

All numerics live in Rust. The Python layer is a thin typed facade plus adapters and exporters. One abi3-py310 wheel covers Python 3.10–3.13.

Status

0.x. The public API may break between minor versions while the library finds its shape; semver is respected within 0.x.y patch releases.

Contributing

See CONTRIBUTING.md. The local quality gates are:

cargo fmt --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features
maturin develop
pytest -v
mypy --strict python/ragdrift
ruff check . && ruff format --check .

License

Dual-licensed under MIT and Apache-2.0. Pick whichever is friendlier to your downstream. See LICENSE-MIT and LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragdrift_py-0.1.4-cp310-abi3-macosx_11_0_arm64.whl (336.6 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file ragdrift_py-0.1.4-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ragdrift_py-0.1.4-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 241d2ad959bba8cca2cbcfe0b32778e887c0085fb0d97876abd676795f669603
MD5 4304e6b6cb9115f99a08fdf00acea5b9
BLAKE2b-256 71f1d96dc482f5e32049e86b9a2f14560afbb7f028828e4c26b5064a723edb47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page