Per-grammar-role loss decomposition for fine-tuned structured JSON output
Project description
valjson
Per-grammar-role loss analysis for structured JSON output from language models.
Fine-tuning your LLM for JSON? Your aggregate metrics might be hiding per-field regressions.
pip install valjson
What it does
valjson is the observability layer for structured JSON output. It meets you wherever you are:
| You have | Command | Needs model? |
|---|---|---|
| Messy text with JSON | valjson --extract --data output.txt |
No |
| JSON + schema | valjson --validate --schema s.json --data output.jsonl |
No |
| Broken JSON + schema | valjson --fix --schema s.json --data output.jsonl |
No |
| JSON + schema | valjson --anatomy --schema s.json --data output.jsonl |
No |
| JSON + schema + gold | valjson --compare --schema s.json --data output.jsonl --gold truth.jsonl |
No |
| Two output sets | valjson --diff --schema s.json --data a.jsonl --data2 b.jsonl |
No |
| Per-field probabilities | valjson --gate --data probs.jsonl |
No |
| Model + checkpoint | valjson --checkpoint lora/ --schema s.json --data test.jsonl |
Yes |
The problem
Standard fine-tuning + grammar-constrained decoding produces valid JSON. Aggregate loss improves. But:
STRUCTURAL 5.33 -> 0.00 -100% OK
KEY 0.47 -> 0.00 -100% OK
BOOLEAN 0.46 -> 1.05 +130% !! REGRESSION
TOTAL 0.55 -> 0.17 -69%
Aggregate loss improved 69%. Boolean prediction got 130% worse. valjson catches this.
Per-field accuracy vs gold
When you have human-labeled gold JSONs, --compare gives you per-field accuracy
without needing a model — role-aware, so a blown free-text field does not drown
the signal on the constrained fields you actually care about:
valjson --compare \
--schema schema.json \
--data generated.jsonl \
--gold gold.jsonl \
--ignore-role STRING,ARRAY
--match-by <key>— pair records by ID (wrapper-first lookup), not line order.--ignore-role STRING,ARRAY— focus on BOOLEAN / ENUM / NUMBER fields where exact equality is meaningful. Unmatched IDs are reported separately.
Evidential gating: abstain when the model is unsure
When the model returns a probability distribution over allowed values for each
constrained field, --gate decides per record whether to commit, abstain,
or reject based on the margin between the top two values:
valjson --gate \
--data probs.jsonl \
--margin-threshold 0.30
Input format (one record per line):
{"id": "rec-001",
"probs": {
"refundable": {"True": 0.553, "False": 0.447},
"status": {"submitted": 0.9, "pending": 0.1, "draft": 0.0}
}}
A field's margin = top_prob − second_prob. If margin ≥ threshold, the
field is committed; otherwise it abstains. Per-field abstention rates are
reported — fields with persistently high abstention are diagnostic of
underspecified training targets (the regression pattern this paper documents).
Quick start
See QUICK_START.md for a hands-on walkthrough from messy output to full analysis.
Python API
from valjson import analyze
report = analyze(
model_name="Qwen/Qwen2.5-0.5B-Instruct",
checkpoint="my_lora/",
schema="schema.json",
data="test.jsonl",
)
print(report)
if report.regressions:
print(f"REGRESSIONS: {[r.role for r in report.regressions]}")
Exit code is 1 if regressions are detected. Use in CI/CD.
Grammar Roles
| Role | Description | Examples |
|---|---|---|
| STRUCTURAL | JSON syntax | { } [ ] : , |
| QUOTE | String delimiters | " |
| KEY | Object key characters | city, cuisine |
| ENUM_VALUE | Categorical values | Italian, Economy |
| BOOLEAN | Boolean strings | True, False |
| NUMBER | Numeric characters | 42, 3.14 |
| FREE_TEXT | Non-categorical content | names, addresses |
| WHITESPACE | Formatting | spaces, newlines |
Links
- Quick Start Tutorial
- Fixing JSON Fine-Tuning — mitigations, services
- Paper — "Valid JSON, Wrong Answer" (2026)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valjson-2.0.0.tar.gz.
File metadata
- Download URL: valjson-2.0.0.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5dcd6553571a84e9ff5bb5b4db3bcceeee70e3d2bc9ba4f49bba1210bc9b7c6
|
|
| MD5 |
b6cd8e11d82801e20128630e64ee3a58
|
|
| BLAKE2b-256 |
9a508c65e62ab646a64ea8cb1615b71110e597c1df8f4670cf0cdc4a72ffbe00
|
File details
Details for the file valjson-2.0.0-py3-none-any.whl.
File metadata
- Download URL: valjson-2.0.0-py3-none-any.whl
- Upload date:
- Size: 64.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64a25d3c08455fefd1c33a41bbfa30053b473a670d7b2297def75d34085185b5
|
|
| MD5 |
b264b6484381ce89c7bfa5140a714c20
|
|
| BLAKE2b-256 |
74a86b0d763afdd5cc2b3f2f6618a9e22eb618386b74bfbe75207e740b61f677
|