Skip to main content

Per-grammar-role loss decomposition for fine-tuned structured JSON output

Project description

valjson

Per-grammar-role loss analysis for structured JSON output from language models.

Fine-tuning your LLM for JSON? Your aggregate metrics might be hiding per-field regressions.

pip install valjson

What it does

valjson is the observability layer for structured JSON output. It meets you wherever you are:

You have Command Needs model?
Messy text with JSON valjson --extract --data output.txt No
JSON + schema valjson --validate --schema s.json --data output.jsonl No
Broken JSON + schema valjson --fix --schema s.json --data output.jsonl No
JSON + schema valjson --anatomy --schema s.json --data output.jsonl No
Two output sets valjson --diff --schema s.json --data a.jsonl --data2 b.jsonl No
Model + checkpoint valjson --checkpoint lora/ --schema s.json --data test.jsonl Yes

The problem

Standard fine-tuning + grammar-constrained decoding produces valid JSON. Aggregate loss improves. But:

STRUCTURAL         5.33 -> 0.00     -100%   OK
KEY                0.47 -> 0.00     -100%   OK
BOOLEAN            0.46 -> 1.05     +130%   !! REGRESSION
TOTAL              0.55 -> 0.17      -69%

Aggregate loss improved 69%. Boolean prediction got 130% worse. valjson catches this.

Quick start

See QUICK_START.md for a hands-on walkthrough from messy output to full analysis.

Python API

from valjson import analyze

report = analyze(
    model_name="Qwen/Qwen2.5-0.5B-Instruct",
    checkpoint="my_lora/",
    schema="schema.json",
    data="test.jsonl",
)
print(report)

if report.regressions:
    print(f"REGRESSIONS: {[r.role for r in report.regressions]}")

Exit code is 1 if regressions are detected. Use in CI/CD.

Grammar Roles

Role Description Examples
STRUCTURAL JSON syntax { } [ ] : ,
QUOTE String delimiters "
KEY Object key characters city, cuisine
ENUM_VALUE Categorical values Italian, Economy
BOOLEAN Boolean strings True, False
NUMBER Numeric characters 42, 3.14
FREE_TEXT Non-categorical content names, addresses
WHITESPACE Formatting spaces, newlines

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valjson-1.0.0.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valjson-1.0.0-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file valjson-1.0.0.tar.gz.

File metadata

  • Download URL: valjson-1.0.0.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for valjson-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a8752fbdad636d61a47efcee235185d05f7a44ffdc6a0445409c840e6c0cc4e9
MD5 bb9b4b815898d612fa7e34cc567070e2
BLAKE2b-256 6c80ab11c5907450c06373bd9c1b89b0ad8e6640fe6560597c585d8d34650946

See more details on using hashes here.

File details

Details for the file valjson-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: valjson-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 46.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for valjson-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08a6d659a26c0206475225c3cea3afd9e80298b94f3345a5904f66ef3aaf4d2e
MD5 45e1062b063b15a0bbced52b0dfc1289
BLAKE2b-256 766ba1ff0cc61746781e1e1864cb6e4a067e013eaa3b253608607b5c38589abd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page