Validate, repair, and retry LLM structured outputs
Project description
outputguard
Stop wrestling with broken LLM structured output. Validate, repair, and retry — automatically.
The Problem
LLMs produce broken structured output constantly. JSON is the common case, but models also return YAML, TOML, Python-style literals when forced JSON is off, markdown fences, comments, trailing commas, NaN, truncated objects, and helpful commentary around the data you asked for. Every AI application ends up writing the same brittle parser + try/except + regex gauntlet.
The Solution
import outputguard
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
# Typical LLM output — fenced, trailing comma, single quotes
llm_output = '''```json
{'name': 'Alice', 'age': 30,}
```'''
result = outputguard.validate_and_repair(llm_output, schema)
print(result.valid) # True
print(result.data) # {'name': 'Alice', 'age': 30}
print(result.strategies_applied) # ['strip_fences', 'fix_quotes', 'fix_commas']
Fifteen repair strategies, JSON Schema validation, retry prompt generation, and a CLI — now for JSON, YAML, TOML, Python literals, and auto-detected forced-JSON-off output.
Installation
pip install outputguard
Or with uv:
uv add outputguard
Documentation
Start with the README for a fast overview, then use the focused guides when you need exact behavior, API signatures, or command examples:
- API guide - choose the right function and understand result objects.
- Formats guide - JSON, YAML, TOML, Python literals,
auto, andforced-json-off. - Guarded generation guide - wrap an LLM call with validation, repair, retry, and observability.
- Batch processing guide - validate or repair many outputs in one call or from the CLI.
- CLI guide - commands, flags, examples, and exit codes.
- Changelog - release notes and 2.0 migration notes.
What's New in 2.0
OutputGuard 2.0 keeps JSON as the default path, so existing 1.x code continues to work without passing new options. The new capabilities are opt-in:
- Format-aware validation and repair with
format="json","yaml","toml","python-literal","auto", and"forced-json-off". - Guarded generation helpers that call your LLM function, validate the response, optionally repair it, and retry with structured feedback.
- Batch APIs and a
batchCLI command for evals, logs, and offline audits. - More explicit reports and errors for failed guarded-generation runs.
Choosing the Right API
| Goal | API |
|---|---|
| Validate and repair one model output | validate_and_repair() |
| Repair without a full validation workflow | repair() |
| Check validity only | validate() |
| Get parsed Python data or raise | parse() |
| Build a validation-aware retry loop | retry_prompt() |
| Wrap an LLM generation function | guarded_generate() / guarded_generate_async() |
| Validate many outputs | validate_batch() |
| Repair many outputs | repair_batch() |
Quick Start
Validate & Repair
The most common pattern — validate against a schema, auto-repair if broken, get clean data back:
import outputguard
result = outputguard.validate_and_repair(llm_output, schema)
if result.valid:
process(result.data) # Clean, validated dict
if result.repaired:
log(result.strategies_applied) # What was fixed
else:
handle_errors(result.errors) # Detailed error paths
Repair Only
When you just need parseable structured output and don't have a schema:
result = outputguard.repair(broken_json)
print(result.text) # Clean JSON string by default
print(result.strategies_applied) # ['fix_booleans', 'fix_commas']
Validate Only
Check structured output against a schema without attempting repair:
result = outputguard.validate(llm_output, schema)
for error in result.errors:
print(f"{error.path}: {error.message}")
# $.age: 'thirty' is not of type 'integer'
Retry Loop
When repair is not enough, generate a correction prompt and send it back to the LLM:
import outputguard
def get_structured_output(llm, prompt, schema, max_retries=3):
for attempt in range(max_retries + 1):
raw = llm.generate(prompt)
result = outputguard.validate_and_repair(raw, schema)
if result.valid:
return result.data
# Generate a targeted correction prompt
prompt = outputguard.retry_prompt(raw, schema, result.errors)
raise RuntimeError("Failed to get valid output")
The retry prompt tells the LLM exactly what went wrong — which fields are missing, which types are incorrect, and what the schema expects. Works with any LLM provider.
Guarded Generation
For production retry loops, use guarded_generate() to wrap any LLM client without adding provider dependencies:
import outputguard
result = outputguard.guarded_generate(
prompt="Return a user object as JSON",
schema=schema,
max_retries=3,
generate=lambda prompt, context: llm.generate(prompt),
)
if result.valid:
print(result.data)
print(len(result.attempts))
else:
print(result.errors)
guarded_generate() validates each generation, repairs when possible, feeds targeted retry prompts back to the generator, and returns every attempt for observability. Pass repair=False for strict validation-only loops or throw_on_failure=True when invalid output should raise GuardedGenerationError.
Async clients can use guarded_generate_async() with the same options.
Supported Formats
JSON remains the default, so existing code keeps working. Pass format= to parse and repair other data formats:
yaml_result = outputguard.validate_and_repair(
"```yaml\nname: Alice\nage: 30\n```",
schema,
format="yaml",
)
toml_data = outputguard.parse('name = "Alice"\nage = 30', schema, format="toml")
python_data = outputguard.parse("{'name': 'Alice', 'age': 30}", schema, format="python")
# Use auto or forced-json-off when the model is not constrained to JSON.
auto_data = outputguard.parse("name: Alice\nage: 30", schema, format="forced-json-off")
Supported input formats are json, yaml/yml, toml, python/python-literal, auto, and forced-json-off.
Batch Processing
Use batch helpers when validating fixture sets, eval outputs, or logs:
batch = outputguard.validate_batch(outputs, schema, repair=True, format="auto")
print(batch.summary)
# BatchSummary(total=..., valid=..., invalid=..., repaired=..., ...)
repaired = outputguard.repair_batch(outputs)
print(repaired.summary.strategy_counts)
CLI
# Validate JSON against a schema
outputguard validate output.json -s schema.json
# Validate YAML, TOML, Python literal, or auto-detected output
outputguard validate output.yaml -s schema.json --input-format yaml
outputguard validate output.toml -s schema.json --input-format toml
outputguard validate output.txt -s schema.json --input-format forced-json-off
# Validate with auto-repair
outputguard validate output.json -s schema.json --repair
# Repair only (no schema)
outputguard repair output.json
outputguard repair output.yaml --input-format yaml
# Validate a JSON array of output strings
outputguard batch outputs.json -s schema.json --repair -f json
# Pipe from stdin
echo '{name: "Alice", age: 30,}' | outputguard repair -
# Generate a retry prompt
outputguard retry-prompt output.json -s schema.json
# List all repair strategies
outputguard strategies
What It Fixes
Fifteen strategies, applied in order. Most target JSON-family malformations; generic strategies such as strip_fences also repair fenced YAML, TOML, and Python literal output without converting it to JSON.
| # | Strategy | Before | After |
|---|---|---|---|
| 1 | fix_encoding |
Ċ{ĊĠ"a":Ġ1Ċ} |
{"a": 1} |
| 2 | strip_fences |
```json\n{"a": 1}\n``` |
{"a": 1} |
| 3 | extract_json |
Sure! Here's the JSON: {"a": 1} Let me know! |
{"a": 1} |
| 4 | remove_comments |
{"a": 1} // a comment |
{"a": 1} |
| 5 | fix_commas |
{"a": 1, "b": 2,} |
{"a": 1, "b": 2} |
| 6 | fix_quotes |
{'a': 'hello'} |
{"a": "hello"} |
| 7 | fix_keys |
{a: 1, b: 2} |
{"a": 1, "b": 2} |
| 8 | fix_values |
{"a": NaN, "b": Infinity} |
{"a": null, "b": null} |
| 9 | fix_booleans |
{"a": True, "b": None} |
{"a": true, "b": null} |
| 10 | fix_truncated |
{"a": 1, "b": "hel |
{"a": 1, "b": "hel"} |
| 11 | fix_ellipsis |
{"items": [1, 2, ...]} |
{"items": [1, 2]} |
| 12 | fix_unicode |
{"a": "\u00"} |
{"a": "�"} |
| 13 | fix_inner_quotes |
{"a": " "hello" "} |
{"a": " \"hello\" "} |
| 14 | fix_closers |
{"a": [1, 2, 3 |
{"a": [1, 2, 3]} |
| 15 | fix_newlines |
{"a": "line1↵line2"} |
{"a": "line1\nline2"} |
Tested Against 288 Real LLM Models
We tested outputguard against every text-generation model on OpenRouter — 288 models across 40+ providers.
Result: 100% success rate. Every model's output was either valid JSON or successfully repaired.
| Count | |
|---|---|
| Models tested | 288 |
| Valid immediately | 225 (78%) |
| Repaired by outputguard | 63 (22%) |
The 63 repaired outputs were fixed automatically — mostly strip_fences (markdown code fences are the #1 LLM JSON issue), plus extract_json, fix_truncated, and fix_encoding.
4 models were excluded from testing due to broken API responses (tokenizer corruption, truncated streaming) — not JSON issues.
Highlighted model results (click to expand)
| Model | Provider | Result | Fix Applied |
|---|---|---|---|
| GPT-5 Mini | OpenAI | ✅ Clean | — |
| GPT-5 Pro | OpenAI | ✅ Clean | — |
| GPT-4.1 Mini | OpenAI | ✅ Clean | — |
| Claude Sonnet 4.6 | Anthropic | ✅ Clean | — |
| Claude Opus 4.7 | Anthropic | ✅ Clean | — |
| Claude Haiku 4.5 | Anthropic | 🛠️ Repaired | strip_fences |
| Gemini 2.5 Flash | ✅ Clean | — | |
| Gemini 2.5 Pro | 🛠️ Repaired | strip_fences |
|
| Gemini 3.1 Flash Lite | ✅ Clean | — | |
| Grok 4.1 Fast | xAI | ✅ Clean | — |
| Grok 4.3 | xAI | ✅ Clean | — |
| Mistral Medium 3.5 | Mistral | ✅ Clean | — |
| Mistral Large | Mistral | ✅ Clean | — |
| DeepSeek v4 Pro | DeepSeek | ✅ Clean | — |
| DeepSeek v3.2 | DeepSeek | 🛠️ Repaired | strip_fences |
| Llama 4 Maverick | Meta | ✅ Clean | — |
| Llama 4 Scout | Meta | 🛠️ Repaired | strip_fences |
| Qwen 3.6 Flash | Alibaba | ✅ Clean | — |
| Qwen 3 Max | Alibaba | ✅ Clean | — |
| Kimi K2.6 | Moonshot | ✅ Clean | — |
| GLM 5.1 | Zhipu | ✅ Clean | — |
| Command A | Cohere | ✅ Clean | — |
| Phi-4 | Microsoft | 🛠️ Repaired | strip_fences |
| Nova Premier | Amazon | 🛠️ Repaired | strip_fences |
| Seed 1.6 | ByteDance | ✅ Clean | — |
| Mercury 2 | Inception | ✅ Clean | — |
All 288 raw model outputs are committed as test fixtures. Run
python -m tests.real_model_runner sweepto re-test against every model yourself.
Test Suite
1,996 tests across 9 testing dimensions:
| Category | Tests | What it covers |
|---|---|---|
| Strategy exhaustive | 159 | Every strategy pushed to edge cases |
| Adversarial & fuzzing | 286 | 141 chaotic inputs, concurrency, performance |
| API contracts | 145 | parse(), exceptions, reports, CLI, registry |
| LLM corpus | 119 | Real failure patterns from 7 model families |
| Combinations | 115 | Multi-strategy interactions, ordering, idempotency |
| Real model fixtures | 576 | Actual outputs from 288 LLM models |
| Core & integration | 414 | Strategies, validator, repairer, guard, stress |
| Format matrix | 74 | Every public JSON API surface repeated for YAML, TOML, Python literals, auto, aliases, and forced-JSON-off |
| 2.0 orchestration | 10 | Guarded generation, async generation, batch helpers, and batch CLI |
uv run pytest tests/ -q
# 1,996 passed
Configuration
Use the OutputGuard class for fine-grained control over which strategies run:
from outputguard import OutputGuard
# Strict mode — only fix formatting, not content
strict = OutputGuard(
strategies=["strip_fences", "fix_commas"],
max_repair_attempts=1,
)
result = strict.validate_and_repair(text, schema)
# Aggressive mode — all strategies, more attempts
aggressive = OutputGuard(
strategies=None, # All 15 strategies (default)
max_repair_attempts=5,
)
result = aggressive.validate_and_repair(text, schema)
# YAML mode — preserves YAML syntax when repairing fenced output
yaml_guard = OutputGuard(format="yaml")
result = yaml_guard.validate_and_repair("```yaml\nname: Alice\nage: 30\n```", schema)
RepairReport
For debugging and observability, RepairReport gives you a full breakdown of what happened:
from outputguard.report import RepairReport
report = RepairReport(
original_text=original,
final_text=repaired,
success=True,
steps=steps,
)
print(report.summary)
# Repaired using 2 strategy(ies): strip_fences, fix_commas
print(report.confidence) # 0.8 — fewer strategies = higher confidence
print(report.diff) # Unified diff from original to repaired
print(report.step_diffs()) # Per-strategy diffs for verbose logging
Confidence scoring is a heuristic from 0.0 to 1.0. It decreases as more strategies are needed and as the text changes more. Useful for deciding whether to trust a repair or escalate to a retry.
API Reference
Module-level Functions
| Function | Returns | Description |
|---|---|---|
validate(text, schema, format="json") |
ValidationResult |
Validate structured output against a schema |
repair(text, format="json") |
RepairResult |
Auto-repair malformed structured output |
validate_and_repair(text, schema, format="json") |
ValidationResult |
Validate, repair if needed, re-validate |
parse(text, schema, format="json") |
`dict | list |
retry_prompt(text, schema, errors, format="json") |
str |
Generate a correction prompt for the LLM |
guarded_generate(...) |
GuardedGenerateResult |
Retry an arbitrary generator until output validates |
guarded_generate_async(...) |
GuardedGenerateResult |
Async variant for async LLM clients |
validate_batch(texts, schema, ...) |
BatchValidationResult |
Validate many outputs and return aggregate diagnostics |
repair_batch(texts, ...) |
BatchRepairResult |
Repair many outputs and return aggregate diagnostics |
Classes
| Class | Description |
|---|---|
OutputGuard |
Configurable pipeline with strategy selection, retry limits, and default format |
GuardedGenerateResult |
Result with valid, data, text, attempts, errors, repaired, strategies_applied, exhausted, format |
BatchSummary |
Summary with total, valid, invalid, repaired, parse_failures, schema_failures, success_rate, strategy_counts, formats |
ValidationResult |
Result with valid, data, errors, repaired, strategies_applied, format |
RepairResult |
Result with repaired, text, strategies_applied, parse_error, format |
ValidationError |
Error detail with message, path, schema_path, value |
RepairReport |
Detailed report with diff, confidence, summary, step_diffs() |
Exceptions
| Exception | Description |
|---|---|
OutputGuardError |
Base exception |
ParseError |
Structured output could not be parsed even after repair |
SchemaValidationError |
Structured output parsed but does not match the schema |
GuardedGenerationError |
guarded_generate(..., throw_on_failure=True) could not get valid output |
RepairError |
Repair was attempted but failed |
StrategyError |
A specific repair strategy encountered an error |
CLI Reference
outputguard [COMMAND] [OPTIONS]
| Command | Description |
|---|---|
validate INPUT -s SCHEMA |
Validate structured output against a schema |
validate INPUT -s SCHEMA --repair |
Validate with auto-repair |
validate INPUT -s SCHEMA --input-format yaml |
Validate YAML instead of JSON |
repair INPUT |
Repair malformed structured output |
repair INPUT --strategies strip_fences,fix_commas |
Repair with specific strategies |
repair INPUT --input-format forced-json-off |
Repair auto-detected non-JSON output |
batch INPUT -s SCHEMA --repair |
Validate a JSON array of output strings |
retry-prompt INPUT -s SCHEMA |
Generate a correction prompt |
strategies |
List all available strategies |
All commands accept --input-format for the data format, -f json for machine-readable command output, -o FILE to write to a file, and - as INPUT to read from stdin.
Why outputguard?
json.loads() + regex |
outputguard | |
|---|---|---|
| Repair strategies | Roll your own | 15, tested and ordered |
| Schema validation | Separate library | Built in (jsonschema) |
| Retry prompts | Write your own | One function call |
| Retry orchestration | Write a custom loop | guarded_generate() / guarded_generate_async() |
| Batch processing | Ad hoc scripts | validate_batch(), repair_batch(), CLI batch |
| Confidence scoring | No | Yes |
| Truncated JSON | Breaks | Recovers |
| Tests | Probably zero | 1,996 (incl. 288 real LLM models and format matrix coverage) |
| LLM dependencies | — | None (works with any provider) |
| Footprint | — | Small runtime set: click, jsonschema, PyYAML, rich, plus tomli on Python 3.10 |
outputguard has no opinion about which LLM you use or whether JSON mode is available. It operates on strings and schemas — plug it into OpenAI, Anthropic, local models, or anything else.
Examples
See the examples/ directory for complete, runnable scripts:
- basic_usage.py — Core validate/repair workflow
- retry_loop.py — Retry pattern with correction prompts
- guarded_generation.py — Provider-agnostic guarded generation
- custom_pipeline.py — Custom strategy configuration
- batch_processing.py — Process multiple outputs with statistics
Contributing
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
git clone https://github.com/ndcorder/outputguard.git
cd outputguard
uv sync --dev
uv run pytest tests/ -v
TypeScript / JavaScript
Looking for a JS/TS version? See outputguard-js — same core API shape, TypeScript-native.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file outputguard-2.0.0.tar.gz.
File metadata
- Download URL: outputguard-2.0.0.tar.gz
- Upload date:
- Size: 183.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91d73f52392ce22332d52ee4b10806f7117103e666e2ea9e5476da09377851ed
|
|
| MD5 |
f702728ac587a7c5297f7da8ae84c790
|
|
| BLAKE2b-256 |
53eb5a5d05299d21481e8f2a585b9be887a9df05dc2f38b0611592eb788be06a
|
Provenance
The following attestation bundles were made for outputguard-2.0.0.tar.gz:
Publisher:
ci.yml on ndcorder/outputguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
outputguard-2.0.0.tar.gz -
Subject digest:
91d73f52392ce22332d52ee4b10806f7117103e666e2ea9e5476da09377851ed - Sigstore transparency entry: 1492016061
- Sigstore integration time:
-
Permalink:
ndcorder/outputguard@a8a3bd4c7f5f6644c39e2440d0808431a7dad39f -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/ndcorder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@a8a3bd4c7f5f6644c39e2440d0808431a7dad39f -
Trigger Event:
push
-
Statement type:
File details
Details for the file outputguard-2.0.0-py3-none-any.whl.
File metadata
- Download URL: outputguard-2.0.0-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c515f2b996626e6477e32738daff1bf653aa37cfd4a77d8b6b00dd3890bfeb2b
|
|
| MD5 |
ed8aa5f39e8f95744a5ad4daa3fe08f8
|
|
| BLAKE2b-256 |
25e25be02056e6e258d973d2eee4a76701828dcecc225ddb5f48c2ea07a10fef
|
Provenance
The following attestation bundles were made for outputguard-2.0.0-py3-none-any.whl:
Publisher:
ci.yml on ndcorder/outputguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
outputguard-2.0.0-py3-none-any.whl -
Subject digest:
c515f2b996626e6477e32738daff1bf653aa37cfd4a77d8b6b00dd3890bfeb2b - Sigstore transparency entry: 1492016195
- Sigstore integration time:
-
Permalink:
ndcorder/outputguard@a8a3bd4c7f5f6644c39e2440d0808431a7dad39f -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/ndcorder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@a8a3bd4c7f5f6644c39e2440d0808431a7dad39f -
Trigger Event:
push
-
Statement type: