Validate, repair, and retry LLM structured outputs
Project description
outputguard
Stop wrestling with broken LLM JSON. Validate, repair, and retry — automatically.
The Problem
LLMs produce broken JSON constantly. They wrap it in markdown fences, leave trailing commas, use Python True/False instead of true/false, sprinkle in NaN, truncate mid-object when they hit token limits, and helpfully add commentary around the JSON you asked for. Every AI application ends up writing the same brittle json.loads() + try/except + regex gauntlet.
The Solution
import outputguard
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
# Typical LLM output — fenced, trailing comma, single quotes
llm_output = '''```json
{'name': 'Alice', 'age': 30,}
```'''
result = outputguard.validate_and_repair(llm_output, schema)
print(result.valid) # True
print(result.data) # {'name': 'Alice', 'age': 30}
print(result.strategies_applied) # ['strip_fences', 'fix_quotes', 'fix_commas']
Fourteen repair strategies, JSON Schema validation, retry prompt generation, and a CLI — in one tiny package with three dependencies.
Installation
pip install outputguard
Or with uv:
uv add outputguard
Quick Start
Validate & Repair
The most common pattern — validate against a schema, auto-repair if broken, get clean data back:
import outputguard
result = outputguard.validate_and_repair(llm_output, schema)
if result.valid:
process(result.data) # Clean, validated dict
if result.repaired:
log(result.strategies_applied) # What was fixed
else:
handle_errors(result.errors) # Detailed error paths
Repair Only
When you just need parseable JSON and don't have a schema:
result = outputguard.repair(broken_json)
print(result.text) # Clean JSON string
print(result.strategies_applied) # ['fix_booleans', 'fix_commas']
Validate Only
Check JSON against a schema without attempting repair:
result = outputguard.validate(llm_output, schema)
for error in result.errors:
print(f"{error.path}: {error.message}")
# $.age: 'thirty' is not of type 'integer'
Retry Loop
When repair is not enough, generate a correction prompt and send it back to the LLM:
import outputguard
def get_structured_output(llm, prompt, schema, max_retries=3):
for attempt in range(max_retries + 1):
raw = llm.generate(prompt)
result = outputguard.validate_and_repair(raw, schema)
if result.valid:
return result.data
# Generate a targeted correction prompt
prompt = outputguard.retry_prompt(raw, schema, result.errors)
raise RuntimeError("Failed to get valid output")
The retry prompt tells the LLM exactly what went wrong — which fields are missing, which types are incorrect, and what the schema expects. Works with any LLM provider.
CLI
# Validate JSON against a schema
outputguard validate output.json -s schema.json
# Validate with auto-repair
outputguard validate output.json -s schema.json --repair
# Repair only (no schema)
outputguard repair output.json
# Pipe from stdin
echo '{name: "Alice", age: 30,}' | outputguard repair -
# Generate a retry prompt
outputguard retry-prompt output.json -s schema.json
# List all repair strategies
outputguard strategies
What It Fixes
Fourteen strategies, applied in order. Each one targets a specific class of LLM JSON malformation:
| # | Strategy | Before | After |
|---|---|---|---|
| 1 | fix_encoding |
Ċ{ĊĠ"a":Ġ1Ċ} |
{"a": 1} |
| 2 | strip_fences |
```json\n{"a": 1}\n``` |
{"a": 1} |
| 3 | extract_json |
Sure! Here's the JSON: {"a": 1} Let me know! |
{"a": 1} |
| 4 | remove_comments |
{"a": 1} // a comment |
{"a": 1} |
| 5 | fix_commas |
{"a": 1, "b": 2,} |
{"a": 1, "b": 2} |
| 6 | fix_quotes |
{'a': 'hello'} |
{"a": "hello"} |
| 7 | fix_keys |
{a: 1, b: 2} |
{"a": 1, "b": 2} |
| 8 | fix_values |
{"a": NaN, "b": Infinity} |
{"a": null, "b": null} |
| 9 | fix_booleans |
{"a": True, "b": None} |
{"a": true, "b": null} |
| 10 | fix_truncated |
{"a": 1, "b": "hel |
{"a": 1, "b": "hel"} |
| 11 | fix_ellipsis |
{"items": [1, 2, ...]} |
{"items": [1, 2]} |
| 12 | fix_unicode |
{"a": "\u00"} |
{"a": "�"} |
| 13 | fix_inner_quotes |
{"a": " "hello" "} |
{"a": " \"hello\" "} |
| 14 | fix_closers |
{"a": [1, 2, 3 |
{"a": [1, 2, 3]} |
| 15 | fix_newlines |
{"a": "line1↵line2"} |
{"a": "line1\nline2"} |
Tested Against 288 Real LLM Models
We tested outputguard against every text-generation model on OpenRouter — 288 models across 40+ providers.
Result: 100% success rate. Every model's output was either valid JSON or successfully repaired.
| Count | |
|---|---|
| Models tested | 288 |
| Valid immediately | 225 (78%) |
| Repaired by outputguard | 63 (22%) |
The 63 repaired outputs were fixed automatically — mostly strip_fences (markdown code fences are the #1 LLM JSON issue), plus extract_json, fix_truncated, and fix_encoding.
4 models were excluded from testing due to broken API responses (tokenizer corruption, truncated streaming) — not JSON issues.
Highlighted model results (click to expand)
| Model | Provider | Result | Fix Applied |
|---|---|---|---|
| GPT-5 Mini | OpenAI | ✅ Clean | — |
| GPT-5 Pro | OpenAI | ✅ Clean | — |
| GPT-4.1 Mini | OpenAI | ✅ Clean | — |
| Claude Sonnet 4.6 | Anthropic | ✅ Clean | — |
| Claude Opus 4.7 | Anthropic | ✅ Clean | — |
| Claude Haiku 4.5 | Anthropic | 🛠️ Repaired | strip_fences |
| Gemini 2.5 Flash | ✅ Clean | — | |
| Gemini 2.5 Pro | 🛠️ Repaired | strip_fences |
|
| Gemini 3.1 Flash Lite | ✅ Clean | — | |
| Grok 4.1 Fast | xAI | ✅ Clean | — |
| Grok 4.3 | xAI | ✅ Clean | — |
| Mistral Medium 3.5 | Mistral | ✅ Clean | — |
| Mistral Large | Mistral | ✅ Clean | — |
| DeepSeek v4 Pro | DeepSeek | ✅ Clean | — |
| DeepSeek v3.2 | DeepSeek | 🛠️ Repaired | strip_fences |
| Llama 4 Maverick | Meta | ✅ Clean | — |
| Llama 4 Scout | Meta | 🛠️ Repaired | strip_fences |
| Qwen 3.6 Flash | Alibaba | ✅ Clean | — |
| Qwen 3 Max | Alibaba | ✅ Clean | — |
| Kimi K2.6 | Moonshot | ✅ Clean | — |
| GLM 5.1 | Zhipu | ✅ Clean | — |
| Command A | Cohere | ✅ Clean | — |
| Phi-4 | Microsoft | 🛠️ Repaired | strip_fences |
| Nova Premier | Amazon | 🛠️ Repaired | strip_fences |
| Seed 1.6 | ByteDance | ✅ Clean | — |
| Mercury 2 | Inception | ✅ Clean | — |
All 288 raw model outputs are committed as test fixtures. Run
python -m tests.real_model_runner sweepto re-test against every model yourself.
Test Suite
1,347 tests across 7 testing dimensions:
| Category | Tests | What it covers |
|---|---|---|
| Strategy exhaustive | 159 | Every strategy pushed to edge cases |
| Adversarial & fuzzing | 286 | 141 chaotic inputs, concurrency, performance |
| API contracts | 145 | parse(), exceptions, reports, CLI, registry |
| LLM corpus | 119 | Real failure patterns from 7 model families |
| Combinations | 115 | Multi-strategy interactions, ordering, idempotency |
| Real model fixtures | 576 | Actual outputs from 288 LLM models |
| Core & integration | 414 | Strategies, validator, repairer, guard, stress |
uv run pytest tests/ -q
# 1,884 passed in 1.42s
Configuration
Use the OutputGuard class for fine-grained control over which strategies run:
from outputguard import OutputGuard
# Strict mode — only fix formatting, not content
strict = OutputGuard(
strategies=["strip_fences", "fix_commas"],
max_repair_attempts=1,
)
result = strict.validate_and_repair(text, schema)
# Aggressive mode — all strategies, more attempts
aggressive = OutputGuard(
strategies=None, # All 13 strategies (default)
max_repair_attempts=5,
)
result = aggressive.validate_and_repair(text, schema)
RepairReport
For debugging and observability, RepairReport gives you a full breakdown of what happened:
from outputguard.report import RepairReport
report = RepairReport(
original_text=original,
final_text=repaired,
success=True,
steps=steps,
)
print(report.summary)
# Repaired using 2 strategy(ies): strip_fences, fix_commas
print(report.confidence) # 0.8 — fewer strategies = higher confidence
print(report.diff) # Unified diff from original to repaired
print(report.step_diffs()) # Per-strategy diffs for verbose logging
Confidence scoring is a heuristic from 0.0 to 1.0. It decreases as more strategies are needed and as the text changes more. Useful for deciding whether to trust a repair or escalate to a retry.
API Reference
Module-level Functions
| Function | Returns | Description |
|---|---|---|
validate(text, schema) |
ValidationResult |
Validate JSON against a schema |
repair(text) |
RepairResult |
Auto-repair malformed JSON |
validate_and_repair(text, schema) |
ValidationResult |
Validate, repair if needed, re-validate |
retry_prompt(text, schema, errors) |
str |
Generate a correction prompt for the LLM |
Classes
| Class | Description |
|---|---|
OutputGuard |
Configurable pipeline with strategy selection and retry limits |
ValidationResult |
Result with valid, data, errors, repaired, strategies_applied |
RepairResult |
Result with repaired, text, strategies_applied, parse_error |
ValidationError |
Error detail with message, path, schema_path, value |
RepairReport |
Detailed report with diff, confidence, summary, step_diffs() |
Exceptions
| Exception | Description |
|---|---|
OutputGuardError |
Base exception |
ParseError |
JSON could not be parsed even after repair |
SchemaValidationError |
JSON parsed but does not match the schema |
RepairError |
Repair was attempted but failed |
StrategyError |
A specific repair strategy encountered an error |
CLI Reference
outputguard [COMMAND] [OPTIONS]
| Command | Description |
|---|---|
validate INPUT -s SCHEMA |
Validate JSON against a schema |
validate INPUT -s SCHEMA --repair |
Validate with auto-repair |
repair INPUT |
Repair malformed JSON |
repair INPUT --strategies strip_fences,fix_commas |
Repair with specific strategies |
retry-prompt INPUT -s SCHEMA |
Generate a correction prompt |
strategies |
List all available strategies |
All commands accept -f json for machine-readable output, -o FILE to write to a file, and - as INPUT to read from stdin.
Why outputguard?
json.loads() + regex |
outputguard | |
|---|---|---|
| Repair strategies | Roll your own | 15, tested and ordered |
| Schema validation | Separate library | Built in (jsonschema) |
| Retry prompts | Write your own | One function call |
| Confidence scoring | No | Yes |
| Truncated JSON | Breaks | Recovers |
| Tests | Probably zero | 1,884 (incl. 288 real LLM models) |
| LLM dependencies | — | None (works with any provider) |
| Footprint | — | 3 deps: click, jsonschema, rich |
outputguard has no opinion about which LLM you use. It operates on strings and schemas — plug it into OpenAI, Anthropic, local models, or anything else.
Examples
See the examples/ directory for complete, runnable scripts:
- basic_usage.py — Core validate/repair workflow
- retry_loop.py — Retry pattern with correction prompts
- custom_pipeline.py — Custom strategy configuration
- batch_processing.py — Process multiple outputs with statistics
Contributing
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
git clone https://github.com/ndcorder/outputguard.git
cd outputguard
uv sync --dev
uv run pytest tests/ -v
TypeScript / JavaScript
Looking for a JS/TS version? See outputguard-js — same 13 strategies, same API shape, TypeScript-native.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file outputguard-1.0.0.tar.gz.
File metadata
- Download URL: outputguard-1.0.0.tar.gz
- Upload date:
- Size: 156.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97c3f24fc9fd8bdeaf43f48738b17295e7311d76249bff8f2027e37bc1ff95bc
|
|
| MD5 |
97cf1bcb7708d70344ae4aeb5b628be3
|
|
| BLAKE2b-256 |
40ab8d8fe067a191d8500a3d033ba0e8c8a6f6b365cf623f355793c99727c0da
|
Provenance
The following attestation bundles were made for outputguard-1.0.0.tar.gz:
Publisher:
ci.yml on ndcorder/outputguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
outputguard-1.0.0.tar.gz -
Subject digest:
97c3f24fc9fd8bdeaf43f48738b17295e7311d76249bff8f2027e37bc1ff95bc - Sigstore transparency entry: 1489045232
- Sigstore integration time:
-
Permalink:
ndcorder/outputguard@b3b68785214bf8060be580e4e9c05b041feb2c6d -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/ndcorder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@b3b68785214bf8060be580e4e9c05b041feb2c6d -
Trigger Event:
push
-
Statement type:
File details
Details for the file outputguard-1.0.0-py3-none-any.whl.
File metadata
- Download URL: outputguard-1.0.0-py3-none-any.whl
- Upload date:
- Size: 29.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8c38c84fc1eebfe1bf612a0b8f8907839c27ac401a2a92ff1fd3c024f9a82b2
|
|
| MD5 |
155616e0e597d08e619276ef8bcf0909
|
|
| BLAKE2b-256 |
305bb678d5491f3dc3661afb19c8f43a4dd0bbe7c1a29b1df3bc5da7decdc462
|
Provenance
The following attestation bundles were made for outputguard-1.0.0-py3-none-any.whl:
Publisher:
ci.yml on ndcorder/outputguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
outputguard-1.0.0-py3-none-any.whl -
Subject digest:
f8c38c84fc1eebfe1bf612a0b8f8907839c27ac401a2a92ff1fd3c024f9a82b2 - Sigstore transparency entry: 1489045785
- Sigstore integration time:
-
Permalink:
ndcorder/outputguard@b3b68785214bf8060be580e4e9c05b041feb2c6d -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/ndcorder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@b3b68785214bf8060be580e4e9c05b041feb2c6d -
Trigger Event:
push
-
Statement type: