Skip to main content

Mends the JSON your LLM almost wrote - fast single-pass repair with true incremental streaming

Project description

jsonmend

Mends the JSON your LLM almost wrote.

Truncated tool calls, markdown fences, single quotes, bare keys, Python literals, comments, trailing commas, prose around the payload — jsonmend turns them into valid JSON. It is a drop-in replacement for json_repair that is 5–10× faster on batch repair, ~50× faster on streaming, ships a true incremental streaming API (O(new bytes) per chunk, not O(buffer)), and is the reference implementation of an open, cross-language conformance corpus for JSON repair.

Pure Python, zero dependencies, zero binaries. Works on CPython 3.9–3.14, PyPy, Pyodide/WASM, AWS Lambda — anywhere pip install works.

pip install jsonmend

Scoreboard

JSON repair has no standard: the same broken input is repaired differently by the Python and JavaScript incumbents, which is a real source of production bugs. The jsonmend conformance corpus (460 cases, 19 categories, CC0) defines repair semantics as data — including the genuinely ambiguous cases, where every defensible answer is accepted.

jsonmend 0.1.0 json_repair 0.60.1 jsonrepair 3.14.0 (JS)
corpus pass rate 460/460 (100%) 320/460 (69.6%) 354/460 (77.0%)

Per-category breakdown: corpus/scoreboard.md. Reproduce: python tools/referee.py --write (needs pip install json_repair and npm install jsonrepair, dev-only).

Performance

Median of 7, three independent rounds within ±5%, Python 3.12, M-series macOS. All inputs are broken JSON (the json.loads fast path never runs). Verified-then-timed: outputs are checked equal before timing. Reproduce: python tools/bench.py --verify && python tools/bench.py.

workload size jsonmend json_repair speedup
truncated tool call 1 KB 0.027 ms 0.199 ms 7.3×
truncated row payload 75 KB 1.48 ms 12.6 ms 8.5×
markdown-fenced output 49 KB 0.25 ms 2.6 ms 10.6×
dirty (quotes/keys/literals) 5 KB 0.38 ms 2.6 ms 7.0×

Streaming is a different complexity class

A streaming UI re-renders the partial value on every chunk. With a batch repairer you must re-parse the whole buffer each time — O(n²) total. The stateful Mender only pays for the new bytes:

workload jsonmend Mender json_repair (stream_stable=True)
150 KB in 4 KB chunks 6.9 ms 323 ms 47×
10 MB in 4 KB chunks 1.2 s est. >20 min (quadratic)

Usage

Drop-in for json_repair

# before
from json_repair import repair_json, loads
# after — same call sites
from jsonmend import repair_json, loads

repair_json("{'name': 'John', age: 31")     # '{"name": "John", "age": 31}'
loads('```json\n{"ok": true,}\n```')         # {'ok': True}

repair_json(json_str, return_objects=..., skip_json_loads=..., ensure_ascii=..., **json_dumps_args), loads, load(fd), from_file(path) match json_repair's signatures. Valid JSON short-circuits through C-speed json.loads.

Streaming

from jsonmend import Mender

m = Mender()
for chunk in llm_stream:           # feed as the tokens arrive
    partial = m.feed(chunk)        # best-effort value, O(new bytes)
    render(partial)                # e.g. {"answer": "The capital of Fr"}
value = m.close()                  # final mended value

feed() returns a live view that grows in place — including the string that is currently streaming in. Any chunking gives byte-identical results to batch repair (property-tested over the whole corpus).

Strict mode

from jsonmend import loads, JSONMendError

loads("complete garbage")                  # "" (json_repair-compatible)
loads("complete garbage", strict=True)     # raises JSONMendError

What it fixes

truncated objects/arrays/strings/numbers/literals · markdown fences with prose around them · single/smart/backtick quotes · unescaped inner quotes · missing quotes · bare keys and values · True/False/None/undefined/NaN/ Infinity · //, #, /* */ comments · trailing/missing/extra commas · missing colons · mismatched brackets · concatenated/NDJSON documents · string concatenation ("a" + "b") · JSONP/MongoDB wrappers (ObjectId("…")) · Python tuples/sets · ellipsis placeholders · non-string keys · BOM and exotic whitespace · escaped-JSON documents ({\"a\": 1}) · broken \u escapes and surrogate pairs · 100k-deep nesting (no recursion anywhere)

Why it's fast

  • One resumable state machine serves batch and streaming — batch is a single feed that never suspends, so there is no streaming tax.
  • Strings cost one str.find + one slice when clean; never a per-character Python loop.
  • Speculative C parsing: complete sub-trees inside broken documents are recognized and handed to the C json scanner, with a salvage step that parses the longest clean prefix of a broken container in one shot. Semantics-affecting inputs (NaN, control chars, surrogate escapes) fall back to the machine, so behavior never changes.
  • Bounded backtracking: a string-close decision can revisit one recorded candidate quote, never rescan; adversarial quote storms stay linear (tested).

Guarantees

  • Output is always valid RFC 8259 JSON (or ""/an exception). Unlike json_repair, NaN/Infinity never leak into the output text — they serialize as null (loads still gives you the floats).
  • Output is always UTF-8 encodable (lone surrogates are replaced).
  • Never crashes, never recurses: fuzzed and property-tested, 100k-deep inputs are fine.
  • Mender.close() ≡ batch result, for every chunking (property-tested).

Honest differences vs json_repair

  • logging=True is not supported (it is incompatible with the single-pass design and is one reason json_repair is slow); a no-op shim raises TypeError so you notice.
  • Schema-guided repair (schema=) is not implemented in v0.1.
  • json_repair's stream_stable=True flag changes how truncated escapes render mid-stream; jsonmend's Mender is always stream-stable.
  • On ambiguous corpus cases the libraries may legitimately differ; jsonmend's choices are documented case-by-case in the corpus rationales.

The corpus is the point

If you maintain a JSON-repair library in any language: please steal corpus/. It is CC0, the format is three fields, and 460 cases with rationales are more valuable than any of our engines. Cross-language agreement on repair semantics helps everyone shipping LLM systems.

License

MIT. The conformance corpus is CC0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonmend-0.1.0.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonmend-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file jsonmend-0.1.0.tar.gz.

File metadata

  • Download URL: jsonmend-0.1.0.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jsonmend-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f9998356fb2d6da0bc6a397bb0dd081d546b43524424cd842bf6c8120c17a3db
MD5 53e21a567e072fd45069f5681c13a9b2
BLAKE2b-256 da7d3aa4c3a3d60d8701e7f8f5dd23486dd883cd56f55681da32002a4926ccc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for jsonmend-0.1.0.tar.gz:

Publisher: release.yml on adam2go/jsonmend

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jsonmend-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jsonmend-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jsonmend-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a3627e12f7a4aca283bdbe9c787f45cf9fe264f5c0f1a12c666776877776e17
MD5 55cea93d8e1bd60143559a457f9e5ae2
BLAKE2b-256 fd63cdd4b50d216e2a7e864eeb3c81ffabf1d426594043726e2125555df69157

See more details on using hashes here.

Provenance

The following attestation bundles were made for jsonmend-0.1.0-py3-none-any.whl:

Publisher: release.yml on adam2go/jsonmend

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page