Robust JSON parsing for LLM outputs with automatic repair and field extraction
Project description
LLM JSON Repair
Robust JSON parsing for LLM outputs with automatic repair and field extraction.
LLMs (like Claude, GPT, etc.) often produce malformed JSON due to:
- Trailing commas in arrays and objects
- Unquoted property names
- Truncated output from context length limits
- Markdown wrapping (
```jsonblocks) - JavaScript literals (undefined, NaN)
This library handles all of these issues automatically.
Installation
pip install llm-json-repair
Or install from source:
pip install -e .
Quick Start
from llm_json_repair import parse_json
# Handles trailing commas
result = parse_json('{"items": [1, 2, 3,]}')
print(result.data) # {'items': [1, 2, 3]}
# Extracts from markdown code blocks
text = '```json\n{"status": "ok"}\n```'
result = parse_json(text)
print(result.data) # {'status': 'ok'}
# Reports what was fixed
print(result.was_repaired) # True
print(result.repair_actions) # ['removed_trailing_commas']
Features
Automatic Repair
The parse_json() function automatically fixes common issues:
from llm_json_repair import parse_json
# Trailing commas
parse_json('{"a": 1,}').data # {'a': 1}
# Unquoted keys
parse_json('{foo: "bar"}').data # {'foo': 'bar'}
# Missing closing brackets
parse_json('{"items": [1, 2').data # {'items': [1, 2]}
# JavaScript undefined/NaN
parse_json('{"x": undefined}').data # {'x': None}
Field Extraction for Truncated Responses
When JSON is too broken to parse, extract specific fields:
from llm_json_repair import FieldExtractor, extract_field
# LLM response was truncated mid-JSON
malformed = '''{"facts": ["fact1", "fact2"],
"confidence": 0.8,
"reasoning": "Based on the ana'''
# Extract what we can
extractor = FieldExtractor()
extractor.add_string_array("facts")
extractor.add_number("confidence")
result = extractor.extract(malformed)
print(result["facts"]) # ['fact1', 'fact2']
print(result["confidence"]) # 0.8
# Or use convenience function
facts = extract_field(malformed, "facts", "string_array")
Strict Mode
Raise an exception instead of returning None for unparseable input:
from llm_json_repair import parse_json, ParseError
try:
result = parse_json("not json", strict=True)
except ParseError as e:
print(f"Failed: {e}")
print(f"Tried: {e.attempts}")
API Reference
parse_json(text, *, strict=False, extract_from_text=True)
Main entry point for parsing JSON from LLM output.
Parameters:
text: The text containing JSON to parsestrict: If True, raiseParseErroron failure instead of returning Noneextract_from_text: If True, try to extract JSON from markdown/prose
Returns: ParseResult with:
data: The parsed JSON data (or None if parsing failed)was_repaired: Whether repairs were neededrepair_actions: List of repairs appliedoriginal_text: The original inputrepaired_text: The text after repairs
repair_json(text)
Low-level function to apply repairs without parsing.
Returns: Tuple of (repaired_text, list_of_repairs)
extract_json_from_text(text)
Extract JSON from text that may contain markdown or prose.
Returns: The extracted JSON string, or None
FieldExtractor
Builder for extracting specific fields from malformed JSON.
extractor = FieldExtractor()
extractor.add_string("name")
extractor.add_number("count")
extractor.add_boolean("active")
extractor.add_string_array("tags")
extractor.add_object_array("items")
extractor.add_object("metadata")
result = extractor.extract(text)
Convenience Functions
extract_field(text, field_name, field_type="auto")
extract_array(text, field_name)
extract_object(text, field_name)
Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=llm_json_repair
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_json_repair-1.0.0.tar.gz.
File metadata
- Download URL: llm_json_repair-1.0.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9b2be32b159be847eccb8b4570b747054dd15f579c97d0a8cbf79f34b600e4c
|
|
| MD5 |
0885f5593d56d8abf43510f071bb3a39
|
|
| BLAKE2b-256 |
36d00ea0f68253492088d08a130df630120ece19e1a849010e4cca053d40d9e6
|
File details
Details for the file llm_json_repair-1.0.0-py3-none-any.whl.
File metadata
- Download URL: llm_json_repair-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
907abd6a86e4c8450e55eeb3be008421b9367c075d5eb4400452611daa6728ea
|
|
| MD5 |
edd3e98fcdf105cf8d369d92f8694d03
|
|
| BLAKE2b-256 |
07e7c656aaff9f907749c3813a8e50c47f6bfe23b39f0d8daf11063e9fc193eb
|