Real-time reliability monitoring, failure diagnosis, and self-repair for LLM agents. AUROC 0.879–0.895.

These details have not been verified by PyPI

Project links

Project description

llm-guard-kit

Real-time reliability monitoring, failure diagnosis, and self-repair for LLM agents.

What it does

llm-guard-kit wraps any ReAct / tool-calling LLM agent with a four-tier reliability stack — no labels required on day one:

Tier	Component	What it does
0	`LabelFreeScorer`	Risk score per query in <15 ms. Zero cold-start using behavioral signals.
1	`QppgMonitor`	Drop-in agent monitor. Auto-calibrates, fires alerts, exports reports.
3	`FailureTaxonomist`	Diagnoses why a chain failed (retrieval failure, excessive search, hallucination, …)
4	`SelfHealer`	Converts failure diagnosis into prompt injections that repair the agent mid-run.

Validated AUROC (HotpotQA multi-hop QA, 200 chains):

Condition	Within-domain	Cross-domain
n=0 chains — behavioral signals only (SC2)	0.879	0.570
n≥5 chains — + GMM density (SC8)	0.883	0.664
n≥50 labeled — + QARA obs-pool adapter	0.742	0.675
+ LLM judge (gpt-4o-mini, J2=SC8+judge)	0.895	0.660

Install

pip install llm-guard-kit                    # core (no API key needed)
pip install "llm-guard-kit[qara]"            # + QARA supervised adapter (torch)
pip install "llm-guard-kit[server]"          # + FastAPI HTTP server

Requires Python 3.9+. No API key required for zero-label monitoring.

Quick start — zero labels, zero cold-start

from qppg_service import QppgMonitor

monitor = QppgMonitor(threshold=0.65)   # fires above this risk score

# Call after every agent run
alert = monitor.track(
    question    = "Which city is older, Rome or Athens?",
    steps       = agent_steps,           # list of {thought, action_type, action_arg, observation}
    final_answer= agent.final_answer,
    finished    = True,
)

if alert:
    print(f"HIGH RISK ({alert.risk_score:.2f}): {alert.recommendation}")

# Export a full monitoring report
print(monitor.export_report())
monitor.export_csv("agent_risk_log.csv")

Works on query 1. No training. No labels. AUROC 0.879 within-domain.

Full pipeline — detect → diagnose → repair

from qppg_service import QppgMonitor, FailureTaxonomist, SelfHealer

monitor = QppgMonitor(threshold=0.65)
tx      = FailureTaxonomist()
healer  = SelfHealer()

alert = monitor.track(question, steps, final_answer, finished=True)

if alert:
    # Diagnose WHY it failed
    failure = tx.classify(question, steps, final_answer, finished=True)
    print(failure.primary_mode)   # "EXCESSIVE_SEARCH" | "RETRIEVAL_FAILURE" | ...
    print(failure.explanation)    # human-readable explanation

    # Get a repair prompt to inject into the agent
    action = healer.suggest(failure, question, steps, final_answer)
    print(action.action_type)       # "FORCE_FINISH" | "REPHRASE_QUERY" | ...
    print(action.prompt_injection)  # ready to inject as next agent message
    print(action.urgency)           # "HIGH" | "MEDIUM" | "LOW"

Failure modes detected:

Mode	Trigger	Suggested repair
`RETRIEVAL_FAILURE`	mean cosine(obs, question) < 0.35	`REPHRASE_QUERY`
`EXCESSIVE_SEARCH`	> 4 search steps	`CONSOLIDATE` or `FORCE_FINISH`
`CONFLICTING_EVIDENCE`	high thought variance + high query diversity	`CONSOLIDATE`
`INSUFFICIENT_EVIDENCE`	weak retrieval + ≥ 2 searches	`ADDITIONAL_SEARCH`
`ANSWER_UNSUPPORTED`	answer words absent from reasoning	`VERIFY_ANSWER`
`PREMATURE_STOP`	≤ 1 search, no clean finish	`ADDITIONAL_SEARCH` (urgent)
`LOW_RISK`	no flags	`NO_ACTION`

Progressive calibration

As you accumulate agent logs, the scorer automatically improves:

# After 5+ chains (any domain) — activates GMM density estimation
monitor.calibrate(chains)                    # list of {question, steps, final_answer, finished}

# After 50+ labeled chains — activates QARA supervised obs-pool adapter
monitor.calibrate(chains, labeled=True)      # chains must have "correct": True/False

# Check current status and expected AUROC
print(monitor.scorer.status())

Retrieval quality diagnostics

A standalone signal that tells you which search steps are failing:

from qppg_service import LabelFreeScorer

scorer = LabelFreeScorer()
rq = scorer.retrieval_quality(question, steps)
# {"mean_sim": 0.41, "min_sim": 0.22, "quality_label": "POOR", "per_step": [...]}

Correct agents average mean_sim = 0.554; wrong agents 0.458 (Δ+0.096, p<0.01).

HTTP API (for multi-language / microservice deployments)

pip install "llm-guard-kit[server]"
python -m qppg_service.server --host 0.0.0.0 --port 8080

curl -X POST http://localhost:8080/score \
  -H "Content-Type: application/json" \
  -d '{"question": "...", "steps": [...], "final_answer": "..."}'

Legacy API (v0.1.x — still supported)

from llm_guard import LLMGuard

guard = LLMGuard(api_key="sk-ant-...")
guard.fit(correct_questions=["What is the capital of France?", ...])
result = guard.query("What is 15% of 240?")
print(result.risk_score)   # 0.12 (lower = lower failure risk)

The original LLMGuard class (exp21-23, within-domain AUROC 0.966-1.000 on MATH/HumanEval/TriviaQA) remains fully functional.

Agent step format

steps = [
    {
        "thought":      "I need to find when the Eiffel Tower was built.",
        "action_type":  "Search",
        "action_arg":   "Eiffel Tower construction date",
        "observation":  "The Eiffel Tower was built between 1887 and 1889..."
    },
    {
        "thought":      "I now have the answer.",
        "action_type":  "Finish",
        "action_arg":   "1889",
        "observation":  ""
    }
]

Research background

Built on experiments exp18–45 validating against HotpotQA, NaturalQuestions, TriviaQA, and GSM8K:

Behavioral signals (step count, completion, answer gap): AUROC 0.879 — zero calibration
GMM density estimation on chain embeddings: +0.004 within, +0.094 cross-domain
QARA supervised adapter on observation-pool embeddings: best cross-domain (0.675, p=0.025)
LLM-as-judge (gpt-4o-mini, exp41): 0.895 within-domain when combined with SC8

Paper draft: docs/research_paper.md

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.66.1

Mar 25, 2026

0.66.0

Mar 25, 2026

0.65.2

Mar 25, 2026

0.65.1

Mar 25, 2026

0.65.0

Mar 24, 2026

0.64.0

Mar 24, 2026

0.63.0

Mar 24, 2026

0.62.0

Mar 23, 2026

0.58.0

Mar 23, 2026

0.57.0

Mar 23, 2026

0.56.0

Mar 23, 2026

0.55.0

Mar 22, 2026

0.54.0

Mar 22, 2026

0.53.0

Mar 22, 2026

0.52.0

Mar 22, 2026

0.51.0

Mar 22, 2026

0.50.0

Mar 22, 2026

0.49.0

Mar 22, 2026

0.48.0

Mar 22, 2026

0.47.0

Mar 21, 2026

0.46.0

Mar 21, 2026

0.45.0

Mar 21, 2026

0.44.0

Mar 21, 2026

0.43.0

Mar 20, 2026

0.42.0

Mar 20, 2026

0.41.0

Mar 20, 2026

0.39.0

Mar 20, 2026

0.38.0

Mar 20, 2026

0.37.0

Mar 20, 2026

0.36.0

Mar 20, 2026

0.35.0

Mar 20, 2026

0.34.0

Mar 19, 2026

0.33.0

Mar 19, 2026

0.32.0

Mar 19, 2026

0.31.0

Mar 19, 2026

0.30.0

Mar 19, 2026

0.29.0

Mar 18, 2026

0.28.0

Mar 18, 2026

0.27.0

Mar 18, 2026

0.26.0

Mar 17, 2026

0.25.0

Mar 15, 2026

0.24.0

Mar 15, 2026

0.23.0

Mar 15, 2026

0.22.0

Mar 15, 2026

0.21.0

Mar 15, 2026

0.20.5

Mar 14, 2026

0.20.4

Mar 14, 2026

0.20.3

Mar 14, 2026

0.20.2

Mar 14, 2026

0.20.1

Mar 14, 2026

0.20.0

Mar 14, 2026

0.19.0

Mar 13, 2026

0.18.0

Mar 13, 2026

0.17.0

Mar 12, 2026

0.16.1

Mar 12, 2026

0.16.0

Mar 12, 2026

0.15.0

Mar 12, 2026

0.13.0

Mar 11, 2026

0.12.0

Mar 11, 2026

0.11.0

Mar 11, 2026

0.10.0

Mar 11, 2026

0.9.2

Mar 11, 2026

0.9.1

Mar 11, 2026

0.9.0

Mar 11, 2026

0.7.1

Mar 9, 2026

0.7.0

Mar 9, 2026

0.6.0

Mar 8, 2026

0.3.0

Mar 5, 2026

This version

0.2.0

Mar 5, 2026

0.1.2

Mar 3, 2026

0.1.1

Mar 3, 2026

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_guard_kit-0.2.0-py3-none-any.whl (85.6 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file llm_guard_kit-0.2.0-py3-none-any.whl.

File metadata

Download URL: llm_guard_kit-0.2.0-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 85.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_guard_kit-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`94f4e93e4306204d94f428ab7f2f857829c2c8f049f012fdb1259cd32998d363`
MD5	`d38f042caf031f073a1798a9b026e56a`
BLAKE2b-256	`2205774e777a01be0543084d4d2acc838e86be92d86b3f06f74ecf321efce443`

See more details on using hashes here.

llm-guard-kit 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-guard-kit

What it does

Install

Quick start — zero labels, zero cold-start

Full pipeline — detect → diagnose → repair

Progressive calibration

Retrieval quality diagnostics

HTTP API (for multi-language / microservice deployments)

Legacy API (v0.1.x — still supported)

Agent step format

Research background

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes