Skip to main content

Validate, repair, and retry LLM structured outputs

Project description

outputguard

Stop wrestling with broken LLM JSON. Validate, repair, and retry — automatically.

Python CI License: MIT Tests


The Problem

LLMs produce broken JSON constantly. They wrap it in markdown fences, leave trailing commas, use Python True/False instead of true/false, sprinkle in NaN, truncate mid-object when they hit token limits, and helpfully add commentary around the JSON you asked for. Every AI application ends up writing the same brittle json.loads() + try/except + regex gauntlet.

The Solution

import outputguard

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}

# Typical LLM output — fenced, trailing comma, single quotes
llm_output = '''```json
{'name': 'Alice', 'age': 30,}
```'''

result = outputguard.validate_and_repair(llm_output, schema)
print(result.valid)              # True
print(result.data)               # {'name': 'Alice', 'age': 30}
print(result.strategies_applied) # ['strip_fences', 'fix_quotes', 'fix_commas']

Fourteen repair strategies, JSON Schema validation, retry prompt generation, and a CLI — in one tiny package with three dependencies.

Installation

pip install outputguard

Or with uv:

uv add outputguard

Quick Start

Validate & Repair

The most common pattern — validate against a schema, auto-repair if broken, get clean data back:

import outputguard

result = outputguard.validate_and_repair(llm_output, schema)

if result.valid:
    process(result.data)                  # Clean, validated dict
    if result.repaired:
        log(result.strategies_applied)    # What was fixed
else:
    handle_errors(result.errors)          # Detailed error paths

Repair Only

When you just need parseable JSON and don't have a schema:

result = outputguard.repair(broken_json)
print(result.text)                # Clean JSON string
print(result.strategies_applied)  # ['fix_booleans', 'fix_commas']

Validate Only

Check JSON against a schema without attempting repair:

result = outputguard.validate(llm_output, schema)
for error in result.errors:
    print(f"{error.path}: {error.message}")
    # $.age: 'thirty' is not of type 'integer'

Retry Loop

When repair is not enough, generate a correction prompt and send it back to the LLM:

import outputguard

def get_structured_output(llm, prompt, schema, max_retries=3):
    for attempt in range(max_retries + 1):
        raw = llm.generate(prompt)
        result = outputguard.validate_and_repair(raw, schema)

        if result.valid:
            return result.data

        # Generate a targeted correction prompt
        prompt = outputguard.retry_prompt(raw, schema, result.errors)

    raise RuntimeError("Failed to get valid output")

The retry prompt tells the LLM exactly what went wrong — which fields are missing, which types are incorrect, and what the schema expects. Works with any LLM provider.

CLI

# Validate JSON against a schema
outputguard validate output.json -s schema.json

# Validate with auto-repair
outputguard validate output.json -s schema.json --repair

# Repair only (no schema)
outputguard repair output.json

# Pipe from stdin
echo '{name: "Alice", age: 30,}' | outputguard repair -

# Generate a retry prompt
outputguard retry-prompt output.json -s schema.json

# List all repair strategies
outputguard strategies

What It Fixes

Fourteen strategies, applied in order. Each one targets a specific class of LLM JSON malformation:

# Strategy Before After
1 fix_encoding Ċ{ĊĠ"a":Ġ1Ċ} {"a": 1}
2 strip_fences ```json\n{"a": 1}\n``` {"a": 1}
3 extract_json Sure! Here's the JSON: {"a": 1} Let me know! {"a": 1}
4 remove_comments {"a": 1} // a comment {"a": 1}
5 fix_commas {"a": 1, "b": 2,} {"a": 1, "b": 2}
6 fix_quotes {'a': 'hello'} {"a": "hello"}
7 fix_keys {a: 1, b: 2} {"a": 1, "b": 2}
8 fix_values {"a": NaN, "b": Infinity} {"a": null, "b": null}
9 fix_booleans {"a": True, "b": None} {"a": true, "b": null}
10 fix_truncated {"a": 1, "b": "hel {"a": 1, "b": "hel"}
11 fix_ellipsis {"items": [1, 2, ...]} {"items": [1, 2]}
12 fix_unicode {"a": "\u00"} {"a": "�"}
13 fix_inner_quotes {"a": " "hello" "} {"a": " \"hello\" "}
14 fix_closers {"a": [1, 2, 3 {"a": [1, 2, 3]}
15 fix_newlines {"a": "line1↵line2"} {"a": "line1\nline2"}

Tested Against 288 Real LLM Models

We tested outputguard against every text-generation model on OpenRouter — 288 models across 40+ providers.

Result: 100% success rate. Every model's output was either valid JSON or successfully repaired.

Count
Models tested 288
Valid immediately 225 (78%)
Repaired by outputguard 63 (22%)

The 63 repaired outputs were fixed automatically — mostly strip_fences (markdown code fences are the #1 LLM JSON issue), plus extract_json, fix_truncated, and fix_encoding.

4 models were excluded from testing due to broken API responses (tokenizer corruption, truncated streaming) — not JSON issues.

Highlighted model results (click to expand)
Model Provider Result Fix Applied
GPT-5 Mini OpenAI ✅ Clean
GPT-5 Pro OpenAI ✅ Clean
GPT-4.1 Mini OpenAI ✅ Clean
Claude Sonnet 4.6 Anthropic ✅ Clean
Claude Opus 4.7 Anthropic ✅ Clean
Claude Haiku 4.5 Anthropic 🛠️ Repaired strip_fences
Gemini 2.5 Flash Google ✅ Clean
Gemini 2.5 Pro Google 🛠️ Repaired strip_fences
Gemini 3.1 Flash Lite Google ✅ Clean
Grok 4.1 Fast xAI ✅ Clean
Grok 4.3 xAI ✅ Clean
Mistral Medium 3.5 Mistral ✅ Clean
Mistral Large Mistral ✅ Clean
DeepSeek v4 Pro DeepSeek ✅ Clean
DeepSeek v3.2 DeepSeek 🛠️ Repaired strip_fences
Llama 4 Maverick Meta ✅ Clean
Llama 4 Scout Meta 🛠️ Repaired strip_fences
Qwen 3.6 Flash Alibaba ✅ Clean
Qwen 3 Max Alibaba ✅ Clean
Kimi K2.6 Moonshot ✅ Clean
GLM 5.1 Zhipu ✅ Clean
Command A Cohere ✅ Clean
Phi-4 Microsoft 🛠️ Repaired strip_fences
Nova Premier Amazon 🛠️ Repaired strip_fences
Seed 1.6 ByteDance ✅ Clean
Mercury 2 Inception ✅ Clean

All 288 raw model outputs are committed as test fixtures. Run python -m tests.real_model_runner sweep to re-test against every model yourself.

Test Suite

1,347 tests across 7 testing dimensions:

Category Tests What it covers
Strategy exhaustive 159 Every strategy pushed to edge cases
Adversarial & fuzzing 286 141 chaotic inputs, concurrency, performance
API contracts 145 parse(), exceptions, reports, CLI, registry
LLM corpus 119 Real failure patterns from 7 model families
Combinations 115 Multi-strategy interactions, ordering, idempotency
Real model fixtures 576 Actual outputs from 288 LLM models
Core & integration 414 Strategies, validator, repairer, guard, stress
uv run pytest tests/ -q
# 1,884 passed in 1.42s

Configuration

Use the OutputGuard class for fine-grained control over which strategies run:

from outputguard import OutputGuard

# Strict mode — only fix formatting, not content
strict = OutputGuard(
    strategies=["strip_fences", "fix_commas"],
    max_repair_attempts=1,
)
result = strict.validate_and_repair(text, schema)

# Aggressive mode — all strategies, more attempts
aggressive = OutputGuard(
    strategies=None,          # All 13 strategies (default)
    max_repair_attempts=5,
)
result = aggressive.validate_and_repair(text, schema)

RepairReport

For debugging and observability, RepairReport gives you a full breakdown of what happened:

from outputguard.report import RepairReport

report = RepairReport(
    original_text=original,
    final_text=repaired,
    success=True,
    steps=steps,
)

print(report.summary)
# Repaired using 2 strategy(ies): strip_fences, fix_commas

print(report.confidence)   # 0.8 — fewer strategies = higher confidence
print(report.diff)         # Unified diff from original to repaired
print(report.step_diffs()) # Per-strategy diffs for verbose logging

Confidence scoring is a heuristic from 0.0 to 1.0. It decreases as more strategies are needed and as the text changes more. Useful for deciding whether to trust a repair or escalate to a retry.

API Reference

Module-level Functions

Function Returns Description
validate(text, schema) ValidationResult Validate JSON against a schema
repair(text) RepairResult Auto-repair malformed JSON
validate_and_repair(text, schema) ValidationResult Validate, repair if needed, re-validate
retry_prompt(text, schema, errors) str Generate a correction prompt for the LLM

Classes

Class Description
OutputGuard Configurable pipeline with strategy selection and retry limits
ValidationResult Result with valid, data, errors, repaired, strategies_applied
RepairResult Result with repaired, text, strategies_applied, parse_error
ValidationError Error detail with message, path, schema_path, value
RepairReport Detailed report with diff, confidence, summary, step_diffs()

Exceptions

Exception Description
OutputGuardError Base exception
ParseError JSON could not be parsed even after repair
SchemaValidationError JSON parsed but does not match the schema
RepairError Repair was attempted but failed
StrategyError A specific repair strategy encountered an error

CLI Reference

outputguard [COMMAND] [OPTIONS]
Command Description
validate INPUT -s SCHEMA Validate JSON against a schema
validate INPUT -s SCHEMA --repair Validate with auto-repair
repair INPUT Repair malformed JSON
repair INPUT --strategies strip_fences,fix_commas Repair with specific strategies
retry-prompt INPUT -s SCHEMA Generate a correction prompt
strategies List all available strategies

All commands accept -f json for machine-readable output, -o FILE to write to a file, and - as INPUT to read from stdin.

Why outputguard?

json.loads() + regex outputguard
Repair strategies Roll your own 15, tested and ordered
Schema validation Separate library Built in (jsonschema)
Retry prompts Write your own One function call
Confidence scoring No Yes
Truncated JSON Breaks Recovers
Tests Probably zero 1,884 (incl. 288 real LLM models)
LLM dependencies None (works with any provider)
Footprint 3 deps: click, jsonschema, rich

outputguard has no opinion about which LLM you use. It operates on strings and schemas — plug it into OpenAI, Anthropic, local models, or anything else.

Examples

See the examples/ directory for complete, runnable scripts:

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/ndcorder/outputguard.git
cd outputguard
uv sync --dev
uv run pytest tests/ -v

TypeScript / JavaScript

Looking for a JS/TS version? See outputguard-js — same 13 strategies, same API shape, TypeScript-native.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outputguard-1.0.0.tar.gz (156.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

outputguard-1.0.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file outputguard-1.0.0.tar.gz.

File metadata

  • Download URL: outputguard-1.0.0.tar.gz
  • Upload date:
  • Size: 156.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for outputguard-1.0.0.tar.gz
Algorithm Hash digest
SHA256 97c3f24fc9fd8bdeaf43f48738b17295e7311d76249bff8f2027e37bc1ff95bc
MD5 97cf1bcb7708d70344ae4aeb5b628be3
BLAKE2b-256 40ab8d8fe067a191d8500a3d033ba0e8c8a6f6b365cf623f355793c99727c0da

See more details on using hashes here.

Provenance

The following attestation bundles were made for outputguard-1.0.0.tar.gz:

Publisher: ci.yml on ndcorder/outputguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file outputguard-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: outputguard-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for outputguard-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8c38c84fc1eebfe1bf612a0b8f8907839c27ac401a2a92ff1fd3c024f9a82b2
MD5 155616e0e597d08e619276ef8bcf0909
BLAKE2b-256 305bb678d5491f3dc3661afb19c8f43a4dd0bbe7c1a29b1df3bc5da7decdc462

See more details on using hashes here.

Provenance

The following attestation bundles were made for outputguard-1.0.0-py3-none-any.whl:

Publisher: ci.yml on ndcorder/outputguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page