Fast, minimal JSON repair implemented in Rust (PyO3), focusing on truncation and trailing comma fixes.
Project description
llm_json_utils
Rust/Python utilities for deterministic JSON cleanup and schema‑guided extraction from messy LLM/log output. Exposed via PyO3 + maturin as the module llm_json_utils (PyPI package name llm_json_utils, repo llm_json_utils).
简体中文文档请见:README.zh-CN.md
APIs in this crate
repair_json(text: str) -> Any- strict, minimal JSON repair.JsonExtractor(schema)- finds a schema-shaped object inside noisy bytes/strings and returns Python values.
repair_json: deterministic structural patcher
- Auto-closes truncated objects/arrays at EOF and tolerates trailing commas.
- Ignores
///#line comments,/*...*/block comments, and fencedcode blocks so you can feed Markdown directly. - Parses numbers like Python: ints ->
int, floats ->float, huge ints -> Pythonint(arbitrary precision). - Preserves unknown escapes and broken
\usequences instead of dropping data. - Raises
ValueErroron real structural errors (missing:, mismatched delimiters, etc.) rather than guessing user intent.
JsonExtractor: schema-guided extraction for LLM/log text
- Accepts a minimal JSON-Schema-like dict (
type,properties,items, optionalrequired), builds Aho-Corasick anchors for field names, then hunts for the first object that matches the schema. - Robust to the typical noise around LLM replies: missing/extra commas, truncated containers, stray
%/units after numbers, unescaped quotes, single/full-width quotes, and thousand separators in numbers. - Works on bytes to avoid encoding surprises; will scan for
{automatically and stops once a schema-shaped object is parsed. - Enforces safety valves: recursion depth capped at 128 and strings capped at 1 MB; missing
requiredfields surface asValueError. - Will not synthesize fields or coerce unknown literals; it only extracts what the schema anchors allow.
Design principles
- Deterministic fixes only - patch small, well-defined structural glitches; fail loudly on ambiguous input.
- Schema as the guardrail - extraction is anchored by known field names so we avoid "hallucinating" structure from arbitrary prose.
- Fast and small - hand-rolled recursive descent with zero per-character allocations on the hot path.
Python usage
Strict repair:
from llm_json_utils import repair_json
obj = repair_json('{"a": 1, "b": [1,2,],} // trailing comma is fine')
assert obj == {"a": 1, "b": [1, 2]}
Schema-guided extraction:
from llm_json_utils import JsonExtractor
schema = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"score": {"type": "number"},
},
"required": ["summary"],
}
extractor = JsonExtractor(schema)
blob = b"Thoughts... {'summary': 'Done', 'score': 95.5 %} Thanks!"
data = extractor.extract(blob)
assert data["summary"] == "Done"
assert data["score"] == 95.5
Build locally
pip install maturin
maturin develop
python - <<'PY'
from llm_json_utils import repair_json
print(repair_json('{"x": 1,}'))
PY
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_json_utils-0.2.0.tar.gz.
File metadata
- Download URL: llm_json_utils-0.2.0.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04cd6f2c86fb1576bde323d424ab2101d91345dbb86069e1f62a5a4d20122469
|
|
| MD5 |
651c855fa867ea13ddc24f0811d7e946
|
|
| BLAKE2b-256 |
e1f233a997d61421b31f6de8bbd0417766fe2340faa929acb3c62f96c35b7102
|
File details
Details for the file llm_json_utils-0.2.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: llm_json_utils-0.2.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 283.2 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68908cecba80c94e9f42b8d5236420c42595a7c3854a99cdca085acee330bd42
|
|
| MD5 |
c4f42146e7a45e79ec861b5026ed9bc2
|
|
| BLAKE2b-256 |
86ed78d181d8d87bd54c8e44855011eb2f7c1dc79aec6cbef7acbb60e705528f
|
File details
Details for the file llm_json_utils-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: llm_json_utils-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 457.0 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1030f38033dedee6e76e75b8c711271beba91f9676029108a2baea083c93027
|
|
| MD5 |
9ef6afd26e6b5e057669d112987c75ba
|
|
| BLAKE2b-256 |
f9ea3e0e335dfb341e6626cc0f30af7c16e8cfbd176593661973e3654db42864
|
File details
Details for the file llm_json_utils-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: llm_json_utils-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 375.4 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39d02954706c055fe31842619f4e4ff0919f8619a1e915d4703b8e128bfdffc5
|
|
| MD5 |
aee5210a8d9208307e8070c3c83c5adc
|
|
| BLAKE2b-256 |
18536ee1e3848e233ad1a3e753f0d901ca8f21367ddb8cfdfda36c71d00ef1a1
|
File details
Details for the file llm_json_utils-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: llm_json_utils-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 411.2 kB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52b4ec5f5867818895092db2a6dd7311c118a2ff977fe03be377113492f3191d
|
|
| MD5 |
12b9effdaf9b97119f085282a27891ea
|
|
| BLAKE2b-256 |
2b619b3209139abd8527ed59c93c618449dba372eb6b87cdd7242249b407a0f4
|