Skip to main content

Per-grammar-role loss decomposition for fine-tuned structured JSON output

Project description

valjson

Per-grammar-role loss analysis for structured JSON output from language models.

Fine-tuning your LLM for JSON? Your aggregate metrics might be hiding per-field regressions.

pip install valjson

What it does

valjson is the observability layer for structured JSON output. It meets you wherever you are:

You have Command Needs model?
Messy text with JSON valjson --extract --data output.txt No
JSON + schema valjson --validate --schema s.json --data output.jsonl No
Broken JSON + schema valjson --fix --schema s.json --data output.jsonl No
JSON + schema valjson --anatomy --schema s.json --data output.jsonl No
JSON + schema + gold valjson --compare --schema s.json --data output.jsonl --gold truth.jsonl No
Two output sets valjson --diff --schema s.json --data a.jsonl --data2 b.jsonl No
Per-field probabilities valjson --gate --data probs.jsonl No
Model + checkpoint valjson --checkpoint lora/ --schema s.json --data test.jsonl Yes

The problem

Standard fine-tuning + grammar-constrained decoding produces valid JSON. Aggregate loss improves. But:

STRUCTURAL         5.33 -> 0.00     -100%   OK
KEY                0.47 -> 0.00     -100%   OK
BOOLEAN            0.46 -> 1.05     +130%   !! REGRESSION
TOTAL              0.55 -> 0.17      -69%

Aggregate loss improved 69%. Boolean prediction got 130% worse. valjson catches this.

Per-field accuracy vs gold

When you have human-labeled gold JSONs, --compare gives you per-field accuracy without needing a model — role-aware, so a blown free-text field does not drown the signal on the constrained fields you actually care about:

valjson --compare \
    --schema schema.json \
    --data generated.jsonl \
    --gold gold.jsonl \
    --ignore-role STRING,ARRAY
  • --match-by <key> — pair records by ID (wrapper-first lookup), not line order.
  • --ignore-role STRING,ARRAY — focus on BOOLEAN / ENUM / NUMBER fields where exact equality is meaningful. Unmatched IDs are reported separately.

Evidential gating: abstain when the model is unsure

When the model returns a probability distribution over allowed values for each constrained field, --gate decides per record whether to commit, abstain, or reject based on the margin between the top two values:

valjson --gate \
    --data probs.jsonl \
    --margin-threshold 0.30

Input format (one record per line):

{"id": "rec-001",
 "probs": {
   "refundable": {"True": 0.553, "False": 0.447},
   "status":     {"submitted": 0.9, "pending": 0.1, "draft": 0.0}
 }}

A field's margin = top_prob − second_prob. If margin ≥ threshold, the field is committed; otherwise it abstains. Per-field abstention rates are reported — fields with persistently high abstention are diagnostic of underspecified training targets (the regression pattern this paper documents).

Quick start

See QUICK_START.md for a hands-on walkthrough from messy output to full analysis.

Python API

from valjson import analyze

report = analyze(
    model_name="Qwen/Qwen2.5-0.5B-Instruct",
    checkpoint="my_lora/",
    schema="schema.json",
    data="test.jsonl",
)
print(report)

if report.regressions:
    print(f"REGRESSIONS: {[r.role for r in report.regressions]}")

Exit code is 1 if regressions are detected. Use in CI/CD.

Grammar Roles

Role Description Examples
STRUCTURAL JSON syntax { } [ ] : ,
QUOTE String delimiters "
KEY Object key characters city, cuisine
ENUM_VALUE Categorical values Italian, Economy
BOOLEAN Boolean strings True, False
NUMBER Numeric characters 42, 3.14
FREE_TEXT Non-categorical content names, addresses
WHITESPACE Formatting spaces, newlines

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valjson-2.0.0.tar.gz (55.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valjson-2.0.0-py3-none-any.whl (64.5 kB view details)

Uploaded Python 3

File details

Details for the file valjson-2.0.0.tar.gz.

File metadata

  • Download URL: valjson-2.0.0.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for valjson-2.0.0.tar.gz
Algorithm Hash digest
SHA256 e5dcd6553571a84e9ff5bb5b4db3bcceeee70e3d2bc9ba4f49bba1210bc9b7c6
MD5 b6cd8e11d82801e20128630e64ee3a58
BLAKE2b-256 9a508c65e62ab646a64ea8cb1615b71110e597c1df8f4670cf0cdc4a72ffbe00

See more details on using hashes here.

File details

Details for the file valjson-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: valjson-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for valjson-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64a25d3c08455fefd1c33a41bbfa30053b473a670d7b2297def75d34085185b5
MD5 b264b6484381ce89c7bfa5140a714c20
BLAKE2b-256 74a86b0d763afdd5cc2b3f2f6618a9e22eb618386b74bfbe75207e740b61f677

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page