Skip to main content

Validate, repair, and retry LLM structured outputs

Project description

outputguard

Stop wrestling with broken LLM JSON. Validate, repair, and retry — automatically.

Python CI License: MIT Tests


The Problem

LLMs produce broken JSON constantly. They wrap it in markdown fences, leave trailing commas, use Python True/False instead of true/false, sprinkle in NaN, truncate mid-object when they hit token limits, and helpfully add commentary around the JSON you asked for. Every AI application ends up writing the same brittle json.loads() + try/except + regex gauntlet.

The Solution

import outputguard

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}

# Typical LLM output — fenced, trailing comma, single quotes
llm_output = '''```json
{'name': 'Alice', 'age': 30,}
```'''

result = outputguard.validate_and_repair(llm_output, schema)
print(result.valid)              # True
print(result.data)               # {'name': 'Alice', 'age': 30}
print(result.strategies_applied) # ['strip_fences', 'fix_quotes', 'fix_commas']

Fourteen repair strategies, JSON Schema validation, retry prompt generation, and a CLI — in one tiny package with three dependencies.

Installation

pip install outputguard

Or with uv:

uv add outputguard

Quick Start

Validate & Repair

The most common pattern — validate against a schema, auto-repair if broken, get clean data back:

import outputguard

result = outputguard.validate_and_repair(llm_output, schema)

if result.valid:
    process(result.data)                  # Clean, validated dict
    if result.repaired:
        log(result.strategies_applied)    # What was fixed
else:
    handle_errors(result.errors)          # Detailed error paths

Repair Only

When you just need parseable JSON and don't have a schema:

result = outputguard.repair(broken_json)
print(result.text)                # Clean JSON string
print(result.strategies_applied)  # ['fix_booleans', 'fix_commas']

Validate Only

Check JSON against a schema without attempting repair:

result = outputguard.validate(llm_output, schema)
for error in result.errors:
    print(f"{error.path}: {error.message}")
    # $.age: 'thirty' is not of type 'integer'

Retry Loop

When repair is not enough, generate a correction prompt and send it back to the LLM:

import outputguard

def get_structured_output(llm, prompt, schema, max_retries=3):
    for attempt in range(max_retries + 1):
        raw = llm.generate(prompt)
        result = outputguard.validate_and_repair(raw, schema)

        if result.valid:
            return result.data

        # Generate a targeted correction prompt
        prompt = outputguard.retry_prompt(raw, schema, result.errors)

    raise RuntimeError("Failed to get valid output")

The retry prompt tells the LLM exactly what went wrong — which fields are missing, which types are incorrect, and what the schema expects. Works with any LLM provider.

CLI

# Validate JSON against a schema
outputguard validate output.json -s schema.json

# Validate with auto-repair
outputguard validate output.json -s schema.json --repair

# Repair only (no schema)
outputguard repair output.json

# Pipe from stdin
echo '{name: "Alice", age: 30,}' | outputguard repair -

# Generate a retry prompt
outputguard retry-prompt output.json -s schema.json

# List all repair strategies
outputguard strategies

What It Fixes

Fourteen strategies, applied in order. Each one targets a specific class of LLM JSON malformation:

# Strategy Before After
1 strip_fences ```json\n{"a": 1}\n``` {"a": 1}
2 extract_json Sure! Here's the JSON: {"a": 1} Let me know! {"a": 1}
3 remove_comments {"a": 1} // a comment {"a": 1}
4 fix_commas {"a": 1, "b": 2,} {"a": 1, "b": 2}
5 fix_quotes {'a': 'hello'} {"a": "hello"}
6 fix_keys {a: 1, b: 2} {"a": 1, "b": 2}
7 fix_values {"a": NaN, "b": Infinity} {"a": null, "b": null}
8 fix_booleans {"a": True, "b": None} {"a": true, "b": null}
9 fix_truncated {"a": 1, "b": "hel {"a": 1, "b": "hel"}
10 fix_ellipsis {"items": [1, 2, ...]} {"items": [1, 2]}
11 fix_unicode {"a": "\u00"} {"a": "�"}
12 fix_inner_quotes {"a": " "hello" "} {"a": " \"hello\" "}
13 fix_closers {"a": [1, 2, 3 {"a": [1, 2, 3]}
14 fix_newlines {"a": "line1↵line2"} {"a": "line1\nline2"}

Tested Against 290 Real LLM Models

We tested outputguard against every text-generation model on OpenRouter — 290 models across 40+ providers.

Result: 99% success rate (286/290 models handled correctly)

Count
Models tested 290
Valid immediately 225 (78%)
Repaired by outputguard 61 (21%)
Failed 4 (1%)

The 61 repaired outputs were fixed automatically — 55 needed strip_fences, 4 needed extract_json, 2 needed fix_truncated. The 4 failures were models that returned corrupted/garbled output (tokenizer issues, not JSON problems).

Highlighted model results (click to expand)
Model Provider Result Fix Applied
GPT-5 Mini OpenAI ✅ Clean
GPT-5 Pro OpenAI ✅ Clean
GPT-4.1 Mini OpenAI ✅ Clean
Claude Sonnet 4.6 Anthropic ✅ Clean
Claude Opus 4.7 Anthropic ✅ Clean
Claude Haiku 4.5 Anthropic 🛠️ Repaired strip_fences
Gemini 2.5 Flash Google ✅ Clean
Gemini 2.5 Pro Google 🛠️ Repaired strip_fences
Gemini 3.1 Flash Lite Google ✅ Clean
Grok 4.1 Fast xAI ✅ Clean
Grok 4.3 xAI ✅ Clean
Mistral Medium 3.5 Mistral ✅ Clean
Mistral Large Mistral ✅ Clean
DeepSeek v4 Pro DeepSeek ✅ Clean
DeepSeek v3.2 DeepSeek 🛠️ Repaired strip_fences
Llama 4 Maverick Meta ✅ Clean
Llama 4 Scout Meta 🛠️ Repaired strip_fences
Qwen 3.6 Flash Alibaba ✅ Clean
Qwen 3 Max Alibaba ✅ Clean
Kimi K2.6 Moonshot ✅ Clean
GLM 5.1 Zhipu ✅ Clean
Command A Cohere ✅ Clean
Phi-4 Microsoft 🛠️ Repaired strip_fences
Nova Premier Amazon 🛠️ Repaired strip_fences
Seed 1.6 ByteDance ✅ Clean
Mercury 2 Inception ✅ Clean

All 290 raw model outputs are committed as test fixtures. Run python -m tests.real_model_runner sweep to re-test against every model yourself.

Test Suite

1,347 tests across 7 testing dimensions:

Category Tests What it covers
Strategy exhaustive 159 Every strategy pushed to edge cases
Adversarial & fuzzing 286 141 chaotic inputs, concurrency, performance
API contracts 145 parse(), exceptions, reports, CLI, registry
LLM corpus 119 Real failure patterns from 7 model families
Combinations 115 Multi-strategy interactions, ordering, idempotency
Real model fixtures 580 Actual outputs from 290 LLM models
Core & integration 414 Strategies, validator, repairer, guard, stress
uv run pytest tests/ -q
# 1,881 passed in 1.52s

Configuration

Use the OutputGuard class for fine-grained control over which strategies run:

from outputguard import OutputGuard

# Strict mode — only fix formatting, not content
strict = OutputGuard(
    strategies=["strip_fences", "fix_commas"],
    max_repair_attempts=1,
)
result = strict.validate_and_repair(text, schema)

# Aggressive mode — all strategies, more attempts
aggressive = OutputGuard(
    strategies=None,          # All 13 strategies (default)
    max_repair_attempts=5,
)
result = aggressive.validate_and_repair(text, schema)

RepairReport

For debugging and observability, RepairReport gives you a full breakdown of what happened:

from outputguard.report import RepairReport

report = RepairReport(
    original_text=original,
    final_text=repaired,
    success=True,
    steps=steps,
)

print(report.summary)
# Repaired using 2 strategy(ies): strip_fences, fix_commas

print(report.confidence)   # 0.8 — fewer strategies = higher confidence
print(report.diff)         # Unified diff from original to repaired
print(report.step_diffs()) # Per-strategy diffs for verbose logging

Confidence scoring is a heuristic from 0.0 to 1.0. It decreases as more strategies are needed and as the text changes more. Useful for deciding whether to trust a repair or escalate to a retry.

API Reference

Module-level Functions

Function Returns Description
validate(text, schema) ValidationResult Validate JSON against a schema
repair(text) RepairResult Auto-repair malformed JSON
validate_and_repair(text, schema) ValidationResult Validate, repair if needed, re-validate
retry_prompt(text, schema, errors) str Generate a correction prompt for the LLM

Classes

Class Description
OutputGuard Configurable pipeline with strategy selection and retry limits
ValidationResult Result with valid, data, errors, repaired, strategies_applied
RepairResult Result with repaired, text, strategies_applied, parse_error
ValidationError Error detail with message, path, schema_path, value
RepairReport Detailed report with diff, confidence, summary, step_diffs()

Exceptions

Exception Description
OutputGuardError Base exception
ParseError JSON could not be parsed even after repair
SchemaValidationError JSON parsed but does not match the schema
RepairError Repair was attempted but failed
StrategyError A specific repair strategy encountered an error

CLI Reference

outputguard [COMMAND] [OPTIONS]
Command Description
validate INPUT -s SCHEMA Validate JSON against a schema
validate INPUT -s SCHEMA --repair Validate with auto-repair
repair INPUT Repair malformed JSON
repair INPUT --strategies strip_fences,fix_commas Repair with specific strategies
retry-prompt INPUT -s SCHEMA Generate a correction prompt
strategies List all available strategies

All commands accept -f json for machine-readable output, -o FILE to write to a file, and - as INPUT to read from stdin.

Why outputguard?

json.loads() + regex outputguard
Repair strategies Roll your own 14, tested and ordered
Schema validation Separate library Built in (jsonschema)
Retry prompts Write your own One function call
Confidence scoring No Yes
Truncated JSON Breaks Recovers
Tests Probably zero 1,347 (including real LLM outputs)
LLM dependencies None (works with any provider)
Footprint 3 deps: click, jsonschema, rich

outputguard has no opinion about which LLM you use. It operates on strings and schemas — plug it into OpenAI, Anthropic, local models, or anything else.

Examples

See the examples/ directory for complete, runnable scripts:

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/ndcorder/outputguard.git
cd outputguard
uv sync --dev
uv run pytest tests/ -v

TypeScript / JavaScript

Looking for a JS/TS version? See outputguard-js — same 13 strategies, same API shape, TypeScript-native.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outputguard-0.2.0.tar.gz (156.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

outputguard-0.2.0-py3-none-any.whl (28.7 kB view details)

Uploaded Python 3

File details

Details for the file outputguard-0.2.0.tar.gz.

File metadata

  • Download URL: outputguard-0.2.0.tar.gz
  • Upload date:
  • Size: 156.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for outputguard-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f791c2e70d94fa9f1a4b1fab3db4c34a9dbbce6f2b3a3c503881bf6a19d21fbd
MD5 1e51468a77abb044d36117dac572f18f
BLAKE2b-256 13e74c9b307efd766a97b31ba405c323d84fcd14f7b7eb59e7e43195cd7367aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for outputguard-0.2.0.tar.gz:

Publisher: ci.yml on ndcorder/outputguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file outputguard-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: outputguard-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for outputguard-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e40239973039eb8b76af79753bcf4ec090c47596f900c56ab4170d46b94b8b4
MD5 28c2923fc15e2aabab54586f3c1fa9f6
BLAKE2b-256 6abdee340f78b59280be28bb426de10a8e19a80a824641615fcb5683f7298b23

See more details on using hashes here.

Provenance

The following attestation bundles were made for outputguard-0.2.0-py3-none-any.whl:

Publisher: ci.yml on ndcorder/outputguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page