Skip to main content

Detect topic drift between user intent, retrieved context, and AI answers. Python port of @mukundakatta/context-drift-detector.

Project description

context-drift-detector-py

PyPI Python License: MIT

Detect topic drift between user intent, retrieved context, and AI answers. A fast, lexical Jaccard-overlap heuristic for "did the model wander off?" -- useful as a cheap first-pass guardrail before reaching for an embedding-based check. Zero runtime dependencies.

Python port of @mukundakatta/context-drift-detector.

Install

pip install context-drift-detector-py

Usage

from context_drift_detector import detect

intent  = "What is the capital of France?"
context = ["Paris is the capital of France. It sits on the Seine."]
answer  = "Paris is the capital of France."

report = detect(intent, context, answer)
report.drift              # False
report.drift_score        # 0.0 - 1.0 (higher = more drift)
report.signals            # dict of jaccard overlaps
report.signals["answer_to_context"]  # 0.0 - 1.0

When drift is real:

report = detect(
    intent="What is the capital of France?",
    context_chunks=["Paris is the capital of France."],
    answer="Cats love tuna and naps.",
)
report.drift           # True
report.drift_score     # high (e.g. > 0.65)

API

detect(
    intent: str,
    context_chunks: str | Sequence[str],
    answer: str,
    *,
    threshold: float = 0.65,
    min_term_len: int = 3,
) -> DriftReport

DriftReport fields:

Field Meaning
drift True iff drift_score > threshold.
drift_score 0.0-1.0; weighted blend of answer-to-context (60%) and answer-to-intent (40%) overlap, inverted.
signals.intent_to_context Jaccard overlap between intent and retrieved context.
signals.answer_to_context Jaccard overlap between answer and retrieved context.
signals.answer_to_intent Jaccard overlap between answer and intent.
intent_terms / context_terms / answer_terms Frozensets of the terms used.

detect_context_drift(...) is exported as a JS-aligned alias.

How it works

Tokenizes each input into the lowercase set of min_term_len-char alphanumeric runs, then computes pairwise Jaccard overlaps. Empty inputs short-circuit to drift-free; this is intentional so a totally absent retrieval doesn't get flagged as drift on its own.

This is a cheap heuristic -- it doesn't catch paraphrases, synonyms, or semantically grounded contradictions. Use it as a fast first filter, then invest in an embedding/LLM-as-judge check for borderline cases.

See the JS sibling's README for the full design notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

context_drift_detector_py-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

context_drift_detector_py-0.1.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file context_drift_detector_py-0.1.0.tar.gz.

File metadata

File hashes

Hashes for context_drift_detector_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c60f75264a85c2cd63e5838f748724ff449bd6ca1506d558ab53873594ada457
MD5 c6c0b0c2efc378db19e7f0c5670a304b
BLAKE2b-256 d3fd31af18eeae15cea83d0d694c97cdd7c04afe4f676839b66808668be51403

See more details on using hashes here.

File details

Details for the file context_drift_detector_py-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for context_drift_detector_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2dbfc71d5c88626ecfc5d5f2421208f895631d96dcf5b4abcdc5b599bfd284e
MD5 7dad343d46e47de87e8f0466ac2ea2cd
BLAKE2b-256 6da640fd3f36e21f124edc6d44cf6f94c8ecb9b6be78315d110602f94c194dfe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page