Skip to main content

Fast, minimal JSON repair implemented in Rust (PyO3), focusing on truncation and trailing comma fixes.

Project description

llm_json_utils

Rust/Python utilities for deterministic JSON cleanup and schema‑guided extraction from messy LLM/log output. Exposed via PyO3 + maturin as the module llm_json_utils (PyPI package name llm_json_utils, repo llm_json_utils).

简体中文文档请见:README.zh-CN.md

APIs in this crate

  • repair_json(text: str) -> Any - strict, minimal JSON repair.
  • JsonExtractor(schema) - finds a schema-shaped object inside noisy bytes/strings and returns Python values.

repair_json: deterministic structural patcher

  • Auto-closes truncated objects/arrays at EOF and tolerates trailing commas.
  • Ignores // / # line comments, /*...*/ block comments, and fenced code blocks so you can feed Markdown directly.
  • Parses numbers like Python: ints -> int, floats -> float, huge ints -> Python int (arbitrary precision).
  • Preserves unknown escapes and broken \u sequences instead of dropping data.
  • Raises ValueError on real structural errors (missing :, mismatched delimiters, etc.) rather than guessing user intent.

JsonExtractor: schema-guided extraction for LLM/log text

  • Accepts a minimal JSON-Schema-like dict (type, properties, items, optional required), builds Aho-Corasick anchors for field names, then hunts for the first object that matches the schema.
  • Robust to the typical noise around LLM replies: missing/extra commas, truncated containers, stray %/units after numbers, unescaped quotes, single/full-width quotes, and thousand separators in numbers.
  • Works on bytes to avoid encoding surprises; will scan for { automatically and stops once a schema-shaped object is parsed.
  • Enforces safety valves: recursion depth capped at 128 and strings capped at 1 MB; missing required fields surface as ValueError.
  • Will not synthesize fields or coerce unknown literals; it only extracts what the schema anchors allow.

Design principles

  • Deterministic fixes only - patch small, well-defined structural glitches; fail loudly on ambiguous input.
  • Schema as the guardrail - extraction is anchored by known field names so we avoid "hallucinating" structure from arbitrary prose.
  • Fast and small - hand-rolled recursive descent with zero per-character allocations on the hot path.

Python usage

Strict repair:

from llm_json_utils import repair_json

obj = repair_json('{"a": 1, "b": [1,2,],} // trailing comma is fine')
assert obj == {"a": 1, "b": [1, 2]}

Schema-guided extraction:

from llm_json_utils import JsonExtractor

schema = {
    "type": "object",
    "properties": {
        "summary": {"type": "string"},
        "score": {"type": "number"},
    },
    "required": ["summary"],
}

extractor = JsonExtractor(schema)
blob = b"Thoughts... {'summary': 'Done', 'score': 95.5 %} Thanks!"
data = extractor.extract(blob)
assert data["summary"] == "Done"
assert data["score"] == 95.5

Build locally

pip install maturin
maturin develop
python - <<'PY'
from llm_json_utils import repair_json
print(repair_json('{"x": 1,}'))
PY

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_json_utils-0.2.0.tar.gz (28.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

llm_json_utils-0.2.0-cp39-abi3-win_amd64.whl (283.2 kB view details)

Uploaded CPython 3.9+Windows x86-64

llm_json_utils-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (457.0 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

llm_json_utils-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (375.4 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

llm_json_utils-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl (411.2 kB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file llm_json_utils-0.2.0.tar.gz.

File metadata

  • Download URL: llm_json_utils-0.2.0.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_json_utils-0.2.0.tar.gz
Algorithm Hash digest
SHA256 04cd6f2c86fb1576bde323d424ab2101d91345dbb86069e1f62a5a4d20122469
MD5 651c855fa867ea13ddc24f0811d7e946
BLAKE2b-256 e1f233a997d61421b31f6de8bbd0417766fe2340faa929acb3c62f96c35b7102

See more details on using hashes here.

File details

Details for the file llm_json_utils-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for llm_json_utils-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 68908cecba80c94e9f42b8d5236420c42595a7c3854a99cdca085acee330bd42
MD5 c4f42146e7a45e79ec861b5026ed9bc2
BLAKE2b-256 86ed78d181d8d87bd54c8e44855011eb2f7c1dc79aec6cbef7acbb60e705528f

See more details on using hashes here.

File details

Details for the file llm_json_utils-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for llm_json_utils-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1030f38033dedee6e76e75b8c711271beba91f9676029108a2baea083c93027
MD5 9ef6afd26e6b5e057669d112987c75ba
BLAKE2b-256 f9ea3e0e335dfb341e6626cc0f30af7c16e8cfbd176593661973e3654db42864

See more details on using hashes here.

File details

Details for the file llm_json_utils-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llm_json_utils-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 39d02954706c055fe31842619f4e4ff0919f8619a1e915d4703b8e128bfdffc5
MD5 aee5210a8d9208307e8070c3c83c5adc
BLAKE2b-256 18536ee1e3848e233ad1a3e753f0d901ca8f21367ddb8cfdfda36c71d00ef1a1

See more details on using hashes here.

File details

Details for the file llm_json_utils-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for llm_json_utils-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 52b4ec5f5867818895092db2a6dd7311c118a2ff977fe03be377113492f3191d
MD5 12b9effdaf9b97119f085282a27891ea
BLAKE2b-256 2b619b3209139abd8527ed59c93c618449dba372eb6b87cdd7242249b407a0f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page