Per-grammar-role loss decomposition for fine-tuned structured JSON output
Project description
valjson
Per-grammar-role loss analysis for structured JSON output from language models.
Fine-tuning your LLM for JSON? Your aggregate metrics might be hiding per-field regressions.
pip install valjson
What it does
valjson is the observability layer for structured JSON output. It meets you wherever you are:
| You have | Command | Needs model? |
|---|---|---|
| Messy text with JSON | valjson --extract --data output.txt |
No |
| JSON + schema | valjson --validate --schema s.json --data output.jsonl |
No |
| Broken JSON + schema | valjson --fix --schema s.json --data output.jsonl |
No |
| JSON + schema | valjson --anatomy --schema s.json --data output.jsonl |
No |
| Two output sets | valjson --diff --schema s.json --data a.jsonl --data2 b.jsonl |
No |
| Model + checkpoint | valjson --checkpoint lora/ --schema s.json --data test.jsonl |
Yes |
The problem
Standard fine-tuning + grammar-constrained decoding produces valid JSON. Aggregate loss improves. But:
STRUCTURAL 5.33 -> 0.00 -100% OK
KEY 0.47 -> 0.00 -100% OK
BOOLEAN 0.46 -> 1.05 +130% !! REGRESSION
TOTAL 0.55 -> 0.17 -69%
Aggregate loss improved 69%. Boolean prediction got 130% worse. valjson catches this.
Quick start
See QUICK_START.md for a hands-on walkthrough from messy output to full analysis.
Python API
from valjson import analyze
report = analyze(
model_name="Qwen/Qwen2.5-0.5B-Instruct",
checkpoint="my_lora/",
schema="schema.json",
data="test.jsonl",
)
print(report)
if report.regressions:
print(f"REGRESSIONS: {[r.role for r in report.regressions]}")
Exit code is 1 if regressions are detected. Use in CI/CD.
Grammar Roles
| Role | Description | Examples |
|---|---|---|
| STRUCTURAL | JSON syntax | { } [ ] : , |
| QUOTE | String delimiters | " |
| KEY | Object key characters | city, cuisine |
| ENUM_VALUE | Categorical values | Italian, Economy |
| BOOLEAN | Boolean strings | True, False |
| NUMBER | Numeric characters | 42, 3.14 |
| FREE_TEXT | Non-categorical content | names, addresses |
| WHITESPACE | Formatting | spaces, newlines |
Links
- Quick Start Tutorial
- Fixing JSON Fine-Tuning — mitigations, services
- Paper — "Valid JSON, Wrong Answer" (2026)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valjson-1.0.0.tar.gz.
File metadata
- Download URL: valjson-1.0.0.tar.gz
- Upload date:
- Size: 37.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8752fbdad636d61a47efcee235185d05f7a44ffdc6a0445409c840e6c0cc4e9
|
|
| MD5 |
bb9b4b815898d612fa7e34cc567070e2
|
|
| BLAKE2b-256 |
6c80ab11c5907450c06373bd9c1b89b0ad8e6640fe6560597c585d8d34650946
|
File details
Details for the file valjson-1.0.0-py3-none-any.whl.
File metadata
- Download URL: valjson-1.0.0-py3-none-any.whl
- Upload date:
- Size: 46.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08a6d659a26c0206475225c3cea3afd9e80298b94f3345a5904f66ef3aaf4d2e
|
|
| MD5 |
45e1062b063b15a0bbced52b0dfc1289
|
|
| BLAKE2b-256 |
766ba1ff0cc61746781e1e1864cb6e4a067e013eaa3b253608607b5c38589abd
|