Skip to main content

Robust JSON parsing for LLM outputs with automatic repair and field extraction

Project description

LLM JSON Repair

CI Python 3.9+ License: MIT

Robust JSON parsing for LLM outputs with automatic repair and field extraction.

LLMs (like Claude, GPT, etc.) often produce malformed JSON due to:

  • Trailing commas in arrays and objects
  • Unquoted property names
  • Truncated output from context length limits
  • Markdown wrapping (```json blocks)
  • JavaScript literals (undefined, NaN)

This library handles all of these issues automatically.

Installation

pip install llm-json-repair

Or install from source:

pip install -e .

Quick Start

from llm_json_repair import parse_json

# Handles trailing commas
result = parse_json('{"items": [1, 2, 3,]}')
print(result.data)  # {'items': [1, 2, 3]}

# Extracts from markdown code blocks
text = '```json\n{"status": "ok"}\n```'
result = parse_json(text)
print(result.data)  # {'status': 'ok'}

# Reports what was fixed
print(result.was_repaired)      # True
print(result.repair_actions)    # ['removed_trailing_commas']

Features

Automatic Repair

The parse_json() function automatically fixes common issues:

from llm_json_repair import parse_json

# Trailing commas
parse_json('{"a": 1,}').data  # {'a': 1}

# Unquoted keys
parse_json('{foo: "bar"}').data  # {'foo': 'bar'}

# Missing closing brackets
parse_json('{"items": [1, 2').data  # {'items': [1, 2]}

# JavaScript undefined/NaN
parse_json('{"x": undefined}').data  # {'x': None}

Field Extraction for Truncated Responses

When JSON is too broken to parse, extract specific fields:

from llm_json_repair import FieldExtractor, extract_field

# LLM response was truncated mid-JSON
malformed = '''{"facts": ["fact1", "fact2"],
                "confidence": 0.8,
                "reasoning": "Based on the ana'''

# Extract what we can
extractor = FieldExtractor()
extractor.add_string_array("facts")
extractor.add_number("confidence")

result = extractor.extract(malformed)
print(result["facts"])       # ['fact1', 'fact2']
print(result["confidence"])  # 0.8

# Or use convenience function
facts = extract_field(malformed, "facts", "string_array")

Strict Mode

Raise an exception instead of returning None for unparseable input:

from llm_json_repair import parse_json, ParseError

try:
    result = parse_json("not json", strict=True)
except ParseError as e:
    print(f"Failed: {e}")
    print(f"Tried: {e.attempts}")

API Reference

parse_json(text, *, strict=False, extract_from_text=True)

Main entry point for parsing JSON from LLM output.

Parameters:

  • text: The text containing JSON to parse
  • strict: If True, raise ParseError on failure instead of returning None
  • extract_from_text: If True, try to extract JSON from markdown/prose

Returns: ParseResult with:

  • data: The parsed JSON data (or None if parsing failed)
  • was_repaired: Whether repairs were needed
  • repair_actions: List of repairs applied
  • original_text: The original input
  • repaired_text: The text after repairs

repair_json(text)

Low-level function to apply repairs without parsing.

Returns: Tuple of (repaired_text, list_of_repairs)

extract_json_from_text(text)

Extract JSON from text that may contain markdown or prose.

Returns: The extracted JSON string, or None

FieldExtractor

Builder for extracting specific fields from malformed JSON.

extractor = FieldExtractor()
extractor.add_string("name")
extractor.add_number("count")
extractor.add_boolean("active")
extractor.add_string_array("tags")
extractor.add_object_array("items")
extractor.add_object("metadata")

result = extractor.extract(text)

Convenience Functions

extract_field(text, field_name, field_type="auto")
extract_array(text, field_name)
extract_object(text, field_name)

Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=llm_json_repair

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_json_repair-1.0.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_json_repair-1.0.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file llm_json_repair-1.0.0.tar.gz.

File metadata

  • Download URL: llm_json_repair-1.0.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llm_json_repair-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b9b2be32b159be847eccb8b4570b747054dd15f579c97d0a8cbf79f34b600e4c
MD5 0885f5593d56d8abf43510f071bb3a39
BLAKE2b-256 36d00ea0f68253492088d08a130df630120ece19e1a849010e4cca053d40d9e6

See more details on using hashes here.

File details

Details for the file llm_json_repair-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_json_repair-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 907abd6a86e4c8450e55eeb3be008421b9367c075d5eb4400452611daa6728ea
MD5 edd3e98fcdf105cf8d369d92f8694d03
BLAKE2b-256 07e7c656aaff9f907749c3813a8e50c47f6bfe23b39f0d8daf11063e9fc193eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page