Skip to main content

Production-grade silent failure detection for LLM applications — hallucination alerts, PII leak detection, semantic drift, topic guard, and real-time observability

Project description

llm-watchdog

Production-grade silent failure detection for LLM applications.

Traditional monitoring (Datadog, New Relic) shows 200 OK in 1.2 seconds — but it cannot detect hallucinations, PII leaks, topic drift, or quality degradation in your LLM responses. llm-watchdog fills that gap.

pip install llm-watchdog

Why llm-watchdog?

Problem Traditional APM llm-watchdog
Hallucination risk Blind Scored 0–1
PII leaks in output Blind Detected + alerted
Topic drift Blind Keyword + coverage guard
Toxicity Blind Pattern-matched
Quality degradation Blind Refusal + repetition check
Semantic drift over time Blind PSI-based drift detector

Quickstart

from llm-watchdog import llm-watchdoger

watcher = llm-watchdoger()

result = watcher.watch(
    prompt="What is the capital of France?",
    response="The capital of France is Paris.",
)

print(result.passed)          # True
print(result.overall_score)   # 0.0
print(result.overall_risk)    # RiskLevel.LOW

Alert Hooks

from llm-watchdog import llm-watchdoger, AlertEvent

watcher = llm-watchdoger(pii_threshold=0.1)

def my_alert(event: AlertEvent):
    print(f"ALERT: {event.failure_type.value} — score={event.score:.2f}")

watcher.on_alert(my_alert)
watcher.watch("Tell me about John", "Contact john@example.com at 555-555-5555")
# ALERT: pii_leak — score=0.40

Async Support

import asyncio
from llm-watchdog import llm-watchdoger

watcher = llm-watchdoger()

async def main():
    result = await watcher.awatch("prompt", "response")
    print(result.overall_risk)

asyncio.run(main())

Batch Watching

from llm-watchdog.advanced import batch_watch, abatch_watch

pairs = [("prompt1", "response1"), ("prompt2", "response2")]
results = batch_watch(watcher, pairs, max_workers=8)

Topic Guard

watcher = llm-watchdoger(
    topic_allowed=["python", "code", "programming", "function"],
    topic_blocked=["politics", "religion", "violence"],
)
result = watcher.watch("How do I code?", "This involves violent politics.")
# topic drift detected

Advanced Features

Caching

from llm-watchdog.advanced import WatchCache
cache = WatchCache(max_size=512, ttl=300)
cached_watch = cache.memoize(watcher)
result = cached_watch("prompt", "response")
print(cache.stats())

Drift Detection

from llm-watchdog.advanced import DriftDetector
detector = DriftDetector(threshold=0.1)
detector.set_baseline([0.1, 0.2, 0.1, 0.15])
print(detector.is_drifting([0.5, 0.6, 0.4, 0.55]))  # True

Pipeline

from llm-watchdog.advanced import WatchPipeline
pipeline = WatchPipeline()
pipeline.add_step("log", lambda r: r)
pipeline.filter(lambda r: r.passed)
result = pipeline.run(watch_result)

PII Scrubbing

from llm-watchdog.advanced import PIIScrubber
scrubber = PIIScrubber()
clean = scrubber.mask("Email me at alice@example.com or call 555-123-4567")
# "Email me at [REDACTED_EMAIL] or call [REDACTED_PHONE]"

Regression Tracking

from llm-watchdog.advanced import RegressionTracker
tracker = RegressionTracker(tolerance=0.05)
tracker.record("deploy_v1", 0.10)
tracker.record("deploy_v2", 0.25)
print(tracker.is_regressing())  # True

Agent Session Monitoring

from llm-watchdog.advanced import AgentWatchSession
session = AgentWatchSession(watcher, max_risk_budget=2.0)
for prompt, response in agent_turns:
    session.watch_turn(prompt, response)
    if session.is_over_budget():
        raise RuntimeError("Agent risk budget exceeded")

FastAPI Middleware

from fastapi import FastAPI
from llm-watchdog import llm-watchdoger
from llm-watchdog.middleware import create_fastapi_middleware

app = FastAPI()
watcher = llm-watchdoger()
app.add_middleware(create_fastapi_middleware(watcher))

CLI

llm-watchdog --prompt "What is the capital?" --response "Paris is the capital."
llm-watchdog --prompt "Tell me about John" --response "Call 555-123-4567" --json

Configuration

Parameter Default Description
hallucination_threshold 0.5 Flag score above this
pii_threshold 0.1 Any PII triggers flag
toxicity_threshold 0.3 Toxic content threshold
quality_threshold 0.4 Low-quality response threshold
topic_allowed None Keywords expected in response
topic_blocked None Keywords that always trigger flag
block_on_critical False Raise exception on CRITICAL risk

Installation

pip install llm-watchdog                    # core only
pip install llm-watchdog[fastapi]           # with FastAPI middleware
pip install llm-watchdog[flask]             # with Flask middleware
pip install llm-watchdog[opentelemetry]     # with OTEL tracing
pip install llm-watchdog[all]               # everything

Keywords

llm monitoring, ai observability, hallucination detection, pii detection, semantic drift, production ai monitoring, llm alerts, ai safety, prompt monitoring, silent failure detection, llm quality, topic drift, ai reliability, llm guardrails, prompt injection, ai production

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_watchdog-1.0.1.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_watchdog-1.0.1-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_watchdog-1.0.1.tar.gz.

File metadata

  • Download URL: llm_watchdog-1.0.1.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llm_watchdog-1.0.1.tar.gz
Algorithm Hash digest
SHA256 039a0ec9240d849d2bf2e10f078891200a0981176306e18c3763c804244d8c2c
MD5 37bd301d6005700796f74f4336317d7a
BLAKE2b-256 b8b421a2b7fd7381adc2423aef1c4370d4ae799b9250992be45e6072e3e46b3f

See more details on using hashes here.

File details

Details for the file llm_watchdog-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: llm_watchdog-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llm_watchdog-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7c6c51bb66107e93b48c65b7e965e2a17318cf448160d65c05b5bc6c4a7282f6
MD5 cad421fed6ef7f0efb969ce6e88d4199
BLAKE2b-256 e9b3e999ff56765055ce270341fcb8b35d0e3a9307e0505af5fc217978816eb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page