Predict, diagnose, and repair LLM failures automatically. AUROC 0.966–0.993.

These details have not been verified by PyPI

Project links

Project description

llm-guard

Predict, diagnose, and repair LLM failures automatically.

What it does

llm-guard wraps any LLM call with a three-stage reliability layer:

Predict — scores every query for failure risk in <15ms before the LLM responds
Diagnose — clusters accumulated failures into a labeled error taxonomy
Heal — synthesises targeted repair instructions from failure patterns; applies them automatically on future queries

Validated results (Claude Haiku, internal benchmarks):

Benchmark	Task type	AUROC	Precision@10
MATH-500	Math	0.966	100%
HumanEval	Code	0.993	100%
TriviaQA	Factual QA	0.992	100%

Cost: <$0.25 to validate on 664 benchmark problems.

Install

pip install llm-guard

Requires Python 3.9+ and an Anthropic API key.

Quick start — three calibration paths

Path A: You have labeled correct examples

from llm_guard import LLMGuard

guard = LLMGuard(api_key="sk-ant-...")

# Fit on questions your LLM is known to handle correctly
guard.fit(correct_questions=[
    "What is the capital of France?",
    "What is 12 * 15?",
    # ... 50+ examples recommended
])

result = guard.query("What is 15% of 240?")
print(result.answer)      # "36"
print(result.confidence)  # "high" | "medium" | "low"
print(result.risk_score)  # 0.12  (lower = more familiar = lower failure risk)

Path B: No labels — use self-consistency

guard = LLMGuard(api_key="sk-ant-...")

# Runs each question 5 times; those with 80%+ agreement are "probably correct"
guard.fit_from_consistency(
    questions=my_question_pool,  # 100–500 questions
    n_samples=5,
    agreement_threshold=0.8,
)

result = guard.query("Explain the water cycle.")
print(result.confidence)  # "high"

Path C: Automated verifier (code, math, SQL, schema)

import subprocess, textwrap

def python_verifier(question, response):
    """Returns True if the code response passes the test suite."""
    try:
        exec(compile(response, "<llm>", "exec"), {})
        return True
    except Exception:
        return False

guard = LLMGuard(api_key="sk-ant-...")
guard.fit_from_execution(
    questions=coding_questions,
    verifier_fn=python_verifier,
)

result = guard.query("Write a function that reverses a string.")
print(result.answer)

Error Autopsy

Cluster accumulated failures into a labeled taxonomy (read-only, does not modify guard state):

clusters = guard.diagnose(
    failed_questions=failed_qs,
    model_answers=model_answers,
    correct_answers=correct_answers,   # optional but enables suggested_fix
)

for c in clusters:
    print(f"Cluster {c['cluster_id']} ({c['size']} failures): {c['label']}")
    print(f"  Fix: {c.get('suggested_fix', 'n/a')}")

Example output:

Cluster 0 (12 failures): The model misreads multi-step word problems,
  computing intermediate values correctly but applying them to the wrong sub-question.
  Fix: Explicitly label each sub-goal before computing.
Cluster 1 (8 failures): Off-by-one errors in loop boundary conditions.
  Fix: Always verify that loop indices match the stated range inclusivity.

Prompt Healer

Learn from failures and auto-apply targeted repairs on future queries in the same error cluster:

guard.learn_from_errors(
    failed_questions=failed_qs,
    model_answers=model_answers,
    correct_answers=correct_answers,
)

# Future queries near a known failure cluster get the repair instruction injected automatically
result = guard.query("If a train travels 60 mph for 2.5 hours, how far does it go?")
print(result.tool_used)   # "error_fix_0"  ← repair tool was applied
print(result.confidence)  # "medium"

GuardResult fields

Field	Type	Description
`answer`	str	LLM response text
`risk_score`	float	Mean KNN distance; higher = more likely to fail
`confidence`	str	`"high"` / `"medium"` / `"low"`
`tool_used`	str \| None	Repair tool ID if applied
`cluster_id`	int \| None	Error cluster ID if matched
`was_retried`	bool	True if a resource-failure retry fired
`raw_response`	str	Full LLM response (same as `answer` currently)

Constructor parameters

guard = LLMGuard(
    api_key="sk-ant-...",           # Anthropic key (or set ANTHROPIC_API_KEY)
    model="claude-haiku-4-5-20251001",  # any Claude model
    embedding_model="all-MiniLM-L6-v2", # sentence-transformers model
    n_neighbors=5,                  # k for KNN scoring
)

How it works

The failure predictor uses KNN anomaly scoring on sentence-transformer embeddings:

During calibration, embed all known-correct questions → build a KNN index
At query time, embed the new question → compute mean distance to k nearest correct examples
High distance = unfamiliar territory = high failure risk (AUROC 0.966–0.993)

Risk thresholds are auto-calibrated from the training distribution (75th and 95th percentile), so they work across any domain without manual tuning.

Failure-type detection (applied at medium/high risk):

stop_reason == "max_tokens" → resource failure → retry with 2x tokens (no tool)
Otherwise → reasoning failure → apply synthesised cluster repair tool

Limitations

Calibration quality matters. fit() requires ≥6 correct examples; fit_from_consistency() works best when baseline accuracy is >70%. With very low baseline accuracy, few questions will agree across samples.
Embeddings are language-level. The predictor detects unfamiliar phrasing, not unfamiliar reasoning steps. Two questions that look similar but require different reasoning may get similar scores.
repair tools are heuristic. learn_from_errors() synthesises prompt additions using the LLM — they help on average but are not guaranteed to fix every instance of a cluster.
Currently Anthropic-only. OpenAI/other provider support is on the roadmap.
Not a security filter. This tool predicts factual/reasoning failures, not prompt injection or jailbreaks.

Roadmap

OpenAI and Ollama provider support
Async/streaming API
Save/load guard state (.save() / .load())
Score-only mode (no LLM call required)
Dashboard for failure cluster visualization

License

MIT. See LICENSE.

Citation

If you use this in research:

Majumder, A. (2025). LLM Reliability Guard: KNN-based failure prediction
for large language models. AUROC 0.966–0.993 on math, code, and factual QA.
https://github.com/avighan/qppg

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.66.1

Mar 25, 2026

0.66.0

Mar 25, 2026

0.65.2

Mar 25, 2026

0.65.1

Mar 25, 2026

0.65.0

Mar 24, 2026

0.64.0

Mar 24, 2026

0.63.0

Mar 24, 2026

0.62.0

Mar 23, 2026

0.58.0

Mar 23, 2026

0.57.0

Mar 23, 2026

0.56.0

Mar 23, 2026

0.55.0

Mar 22, 2026

0.54.0

Mar 22, 2026

0.53.0

Mar 22, 2026

0.52.0

Mar 22, 2026

0.51.0

Mar 22, 2026

0.50.0

Mar 22, 2026

0.49.0

Mar 22, 2026

0.48.0

Mar 22, 2026

0.47.0

Mar 21, 2026

0.46.0

Mar 21, 2026

0.45.0

Mar 21, 2026

0.44.0

Mar 21, 2026

0.43.0

Mar 20, 2026

0.42.0

Mar 20, 2026

0.41.0

Mar 20, 2026

0.39.0

Mar 20, 2026

0.38.0

Mar 20, 2026

0.37.0

Mar 20, 2026

0.36.0

Mar 20, 2026

0.35.0

Mar 20, 2026

0.34.0

Mar 19, 2026

0.33.0

Mar 19, 2026

0.32.0

Mar 19, 2026

0.31.0

Mar 19, 2026

0.30.0

Mar 19, 2026

0.29.0

Mar 18, 2026

0.28.0

Mar 18, 2026

0.27.0

Mar 18, 2026

0.26.0

Mar 17, 2026

0.25.0

Mar 15, 2026

0.24.0

Mar 15, 2026

0.23.0

Mar 15, 2026

0.22.0

Mar 15, 2026

0.21.0

Mar 15, 2026

0.20.5

Mar 14, 2026

0.20.4

Mar 14, 2026

0.20.3

Mar 14, 2026

0.20.2

Mar 14, 2026

0.20.1

Mar 14, 2026

0.20.0

Mar 14, 2026

0.19.0

Mar 13, 2026

0.18.0

Mar 13, 2026

0.17.0

Mar 12, 2026

0.16.1

Mar 12, 2026

0.16.0

Mar 12, 2026

0.15.0

Mar 12, 2026

0.13.0

Mar 11, 2026

0.12.0

Mar 11, 2026

0.11.0

Mar 11, 2026

0.10.0

Mar 11, 2026

0.9.2

Mar 11, 2026

0.9.1

Mar 11, 2026

0.9.0

Mar 11, 2026

0.7.1

Mar 9, 2026

0.7.0

Mar 9, 2026

0.6.0

Mar 8, 2026

0.3.0

Mar 5, 2026

0.2.0

Mar 5, 2026

0.1.2

Mar 3, 2026

0.1.1

Mar 3, 2026

This version

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_guard_kit-0.1.0.tar.gz (39.0 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_guard_kit-0.1.0-py3-none-any.whl (39.4 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file llm_guard_kit-0.1.0.tar.gz.

File metadata

Download URL: llm_guard_kit-0.1.0.tar.gz
Upload date: Mar 3, 2026
Size: 39.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_guard_kit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c0e02c3dd32b7d26705c2fe1046b9d503f2978b611f56142be6e63458a93a047`
MD5	`1f5ddddf8d059efca41073e84082b077`
BLAKE2b-256	`8285f9b17bf06ce327fe516d11ad31654579266865ca21e3f350aa16e52e6c84`

See more details on using hashes here.

File details

Details for the file llm_guard_kit-0.1.0-py3-none-any.whl.

File metadata

Download URL: llm_guard_kit-0.1.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 39.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_guard_kit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1489e56f0c7a73dc6204ad8cbc60132405e0d9ff35e6137bd21cf1bbaf44f5cd`
MD5	`57692b890ea0556594cba802f2d917b5`
BLAKE2b-256	`83710aad7eaa600f10bec5d048d77aeb02fdd88eed4504bedca74405adcf0b1e`

See more details on using hashes here.

llm-guard-kit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-guard

What it does

Install

Quick start — three calibration paths

Path A: You have labeled correct examples

Path B: No labels — use self-consistency

Path C: Automated verifier (code, math, SQL, schema)

Error Autopsy

Prompt Healer

GuardResult fields

Constructor parameters

How it works

Limitations

Roadmap

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes