Mends the JSON your LLM almost wrote - fast single-pass repair with true incremental streaming
Project description
jsonmend
Mends the JSON your LLM almost wrote.
Truncated tool calls, markdown fences, single quotes, bare keys, Python literals, comments, trailing commas, prose around the payload — jsonmend turns them into valid JSON. It is a drop-in replacement for json_repair that is 5–10× faster on batch repair, ~50× faster on streaming, ships a true incremental streaming API (O(new bytes) per chunk, not O(buffer)), and is the reference implementation of an open, cross-language conformance corpus for JSON repair.
Pure Python, zero dependencies, zero binaries. Works on CPython 3.9–3.14,
PyPy, Pyodide/WASM, AWS Lambda — anywhere pip install works.
pip install jsonmend
Scoreboard
JSON repair has no standard: the same broken input is repaired differently by the Python and JavaScript incumbents, which is a real source of production bugs. The jsonmend conformance corpus (460 cases, 19 categories, CC0) defines repair semantics as data — including the genuinely ambiguous cases, where every defensible answer is accepted.
| jsonmend 0.1.0 | json_repair 0.60.1 | jsonrepair 3.14.0 (JS) | |
|---|---|---|---|
| corpus pass rate | 460/460 (100%) | 320/460 (69.6%) | 354/460 (77.0%) |
Per-category breakdown: corpus/scoreboard.md.
Reproduce: python tools/referee.py --write (needs pip install json_repair and npm install jsonrepair, dev-only).
Performance
Median of 7, three independent rounds within ±5%, Python 3.12, M-series
macOS. All inputs are broken JSON (the json.loads fast path never
runs). Verified-then-timed: outputs are checked equal before timing.
Reproduce: python tools/bench.py --verify && python tools/bench.py.
| workload | size | jsonmend | json_repair | speedup |
|---|---|---|---|---|
| truncated tool call | 1 KB | 0.027 ms | 0.199 ms | 7.3× |
| truncated row payload | 75 KB | 1.48 ms | 12.6 ms | 8.5× |
| markdown-fenced output | 49 KB | 0.25 ms | 2.6 ms | 10.6× |
| dirty (quotes/keys/literals) | 5 KB | 0.38 ms | 2.6 ms | 7.0× |
Streaming is a different complexity class
A streaming UI re-renders the partial value on every chunk. With a batch
repairer you must re-parse the whole buffer each time — O(n²) total. The
stateful Mender only pays for the new bytes:
| workload | jsonmend Mender |
json_repair (stream_stable=True) |
|
|---|---|---|---|
| 150 KB in 4 KB chunks | 6.9 ms | 323 ms | 47× |
| 10 MB in 4 KB chunks | 1.2 s | est. >20 min (quadratic) | — |
Usage
Drop-in for json_repair
# before
from json_repair import repair_json, loads
# after — same call sites
from jsonmend import repair_json, loads
repair_json("{'name': 'John', age: 31") # '{"name": "John", "age": 31}'
loads('```json\n{"ok": true,}\n```') # {'ok': True}
repair_json(json_str, return_objects=..., skip_json_loads=..., ensure_ascii=..., **json_dumps_args), loads, load(fd),
from_file(path) match json_repair's signatures. Valid JSON
short-circuits through C-speed json.loads.
Streaming
from jsonmend import Mender
m = Mender()
for chunk in llm_stream: # feed as the tokens arrive
partial = m.feed(chunk) # best-effort value, O(new bytes)
render(partial) # e.g. {"answer": "The capital of Fr"}
value = m.close() # final mended value
feed() returns a live view that grows in place — including the string
that is currently streaming in. Any chunking gives byte-identical results
to batch repair (property-tested over the whole corpus).
Strict mode
from jsonmend import loads, JSONMendError
loads("complete garbage") # "" (json_repair-compatible)
loads("complete garbage", strict=True) # raises JSONMendError
What it fixes
truncated objects/arrays/strings/numbers/literals · markdown fences with
prose around them · single/smart/backtick quotes · unescaped inner quotes
· missing quotes · bare keys and values · True/False/None/undefined/NaN/ Infinity · //, #, /* */ comments · trailing/missing/extra commas ·
missing colons · mismatched brackets · concatenated/NDJSON documents ·
string concatenation ("a" + "b") · JSONP/MongoDB wrappers
(ObjectId("…")) · Python tuples/sets · ellipsis placeholders ·
non-string keys · BOM and exotic whitespace · escaped-JSON documents
({\"a\": 1}) · broken \u escapes and surrogate pairs · 100k-deep
nesting (no recursion anywhere)
Why it's fast
- One resumable state machine serves batch and streaming — batch is a single feed that never suspends, so there is no streaming tax.
- Strings cost one
str.find+ one slice when clean; never a per-character Python loop. - Speculative C parsing: complete sub-trees inside broken documents
are recognized and handed to the C
jsonscanner, with a salvage step that parses the longest clean prefix of a broken container in one shot. Semantics-affecting inputs (NaN, control chars, surrogate escapes) fall back to the machine, so behavior never changes. - Bounded backtracking: a string-close decision can revisit one recorded candidate quote, never rescan; adversarial quote storms stay linear (tested).
Guarantees
- Output is always valid RFC 8259 JSON (or
""/an exception). Unlike json_repair,NaN/Infinitynever leak into the output text — they serialize asnull(loadsstill gives you the floats). - Output is always UTF-8 encodable (lone surrogates are replaced).
- Never crashes, never recurses: fuzzed and property-tested, 100k-deep inputs are fine.
Mender.close()≡ batch result, for every chunking (property-tested).
Honest differences vs json_repair
logging=Trueis not supported (it is incompatible with the single-pass design and is one reason json_repair is slow); a no-op shim raisesTypeErrorso you notice.- Schema-guided repair (
schema=) is not implemented in v0.1. - json_repair's
stream_stable=Trueflag changes how truncated escapes render mid-stream; jsonmend'sMenderis always stream-stable. - On
ambiguouscorpus cases the libraries may legitimately differ; jsonmend's choices are documented case-by-case in the corpus rationales.
The corpus is the point
If you maintain a JSON-repair library in any language: please steal corpus/. It is CC0, the format is three fields, and 460 cases with rationales are more valuable than any of our engines. Cross-language agreement on repair semantics helps everyone shipping LLM systems.
License
MIT. The conformance corpus is CC0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsonmend-0.1.0.tar.gz.
File metadata
- Download URL: jsonmend-0.1.0.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9998356fb2d6da0bc6a397bb0dd081d546b43524424cd842bf6c8120c17a3db
|
|
| MD5 |
53e21a567e072fd45069f5681c13a9b2
|
|
| BLAKE2b-256 |
da7d3aa4c3a3d60d8701e7f8f5dd23486dd883cd56f55681da32002a4926ccc5
|
Provenance
The following attestation bundles were made for jsonmend-0.1.0.tar.gz:
Publisher:
release.yml on adam2go/jsonmend
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jsonmend-0.1.0.tar.gz -
Subject digest:
f9998356fb2d6da0bc6a397bb0dd081d546b43524424cd842bf6c8120c17a3db - Sigstore transparency entry: 1792467647
- Sigstore integration time:
-
Permalink:
adam2go/jsonmend@3bf15d217cbdbb63741597043781f0006f27825a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/adam2go
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3bf15d217cbdbb63741597043781f0006f27825a -
Trigger Event:
release
-
Statement type:
File details
Details for the file jsonmend-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jsonmend-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a3627e12f7a4aca283bdbe9c787f45cf9fe264f5c0f1a12c666776877776e17
|
|
| MD5 |
55cea93d8e1bd60143559a457f9e5ae2
|
|
| BLAKE2b-256 |
fd63cdd4b50d216e2a7e864eeb3c81ffabf1d426594043726e2125555df69157
|
Provenance
The following attestation bundles were made for jsonmend-0.1.0-py3-none-any.whl:
Publisher:
release.yml on adam2go/jsonmend
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jsonmend-0.1.0-py3-none-any.whl -
Subject digest:
5a3627e12f7a4aca283bdbe9c787f45cf9fe264f5c0f1a12c666776877776e17 - Sigstore transparency entry: 1792467687
- Sigstore integration time:
-
Permalink:
adam2go/jsonmend@3bf15d217cbdbb63741597043781f0006f27825a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/adam2go
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3bf15d217cbdbb63741597043781f0006f27825a -
Trigger Event:
release
-
Statement type: