Per-grammar-role loss decomposition for fine-tuned structured JSON output

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

slotloss

Per-grammar-role loss decomposition for fine-tuned structured JSON output.

Fine-tuning your LLM for JSON output? Your aggregate metrics might be lying to you.

slotloss decomposes fine-tuning loss by grammar role (structural tokens, schema keys, enum values, booleans, free text) and compares baseline vs fine-tuned performance. It reveals per-role regressions that aggregate metrics hide.

The Problem

Standard LoRA fine-tuning + grammar-constrained decoding produces valid JSON at all scales. Aggregate loss improves. Everything looks great.

But at 32B parameters, fine-tuning can degrade specific grammar roles while aggregate loss improves:

slotloss: Per-Grammar-Role Loss Report
======================================================================

Role             Baseline   Fine-tuned     Change  Status
----------------------------------------------------------------------
STRUCTURAL         5.3298       0.0002     -100.0%  OK (-100%)
KEY                0.4736       0.0001     -100.0%  OK (-100%)
ENUM_VALUE         0.3313       0.3029       -8.6%  OK (-9%)
BOOLEAN            0.4568       1.0498     +129.8%  !! REGRESSION (+130%)
FREE_TEXT          1.3287       0.6289      -52.7%  OK (-53%)
----------------------------------------------------------------------
TOTAL              0.5544       0.1742      -68.6%

WARNING: 1 grammar role(s) REGRESSED after fine-tuning:
  BOOLEAN: 0.4568 -> 1.0498 (+130%)

Your model may be memorizing majority values for constrained fields.

Aggregate loss improved 69%. BOOLEAN prediction got 130% worse. Without slotloss, you'd never know.

Install

pip install slotloss

Usage

Command Line

# Compare baseline vs fine-tuned
slotloss --model Qwen/Qwen2.5-7B-Instruct \
    --checkpoint my_lora/ \
    --schema schema.json \
    --data test.jsonl \
    --device cuda

# Baseline only
slotloss --model Qwen/Qwen2.5-7B-Instruct \
    --schema schema.json \
    --data test.jsonl

# Save JSON report
slotloss --model Qwen/Qwen2.5-7B-Instruct \
    --checkpoint my_lora/ \
    --schema schema.json \
    --data test.jsonl \
    --output report.json

Exit code is 1 if regressions are detected, 0 otherwise. Use in CI/CD pipelines.

Python API

from slotloss import analyze

report = analyze(
    model_name="Qwen/Qwen2.5-7B-Instruct",
    checkpoint="my_lora/",
    schema="schema.json",
    data="test.jsonl",
    device="cuda",
)

print(report)  # formatted report with regression warnings

# Programmatic access
for comp in report.comparisons:
    print(f"{comp.role}: {comp.baseline_loss:.4f} -> {comp.finetuned_loss:.4f} ({comp.status})")

if report.regressions:
    print(f"REGRESSIONS: {[r.role for r in report.regressions]}")

Low-Level API

from slotloss import GrammarRole, assign_grammar_roles

# Assign grammar roles to any JSON string
roles = assign_grammar_roles('{"city": "NYC", "cuisine": "Italian"}', schema)
# [STRUCTURAL, QUOTE, KEY, KEY, KEY, KEY, QUOTE, STRUCTURAL, ...]

Data Format

Test data is JSONL with prompt and target_json fields:

{"prompt": "Extract restaurant info...", "target_json": "{\"city\": \"NYC\"}"}

Schema is standard JSON Schema:

{
  "type": "object",
  "properties": {
    "city": {"type": "string"},
    "cuisine": {"type": "string", "enum": ["Mexican", "Italian"]},
    "has_wifi": {"type": "string", "enum": ["True", "False"]}
  }
}

Grammar Roles

Role	Description	Examples
STRUCTURAL	JSON syntax	`{` `}` `[` `]` `:` `,`
QUOTE	String delimiters	`"`
KEY	Object key characters	`city`, `cuisine`
ENUM_VALUE	Categorical values	`Italian`, `Economy`
BOOLEAN	Boolean strings	`True`, `False`
NUMBER	Numeric characters	`42`, `3.14`
FREE_TEXT	Non-categorical content	names, addresses
WHITESPACE	Formatting	spaces, newlines

Why Regressions Happen

Fine-tuning on small datasets biases the model toward training-set patterns. Structural tokens (trivial decisions) improve massively, dominating the aggregate gradient. Constrained fields like booleans and enums (genuine decisions) can overfit to majority values. Aggregate loss improves because the large gains on trivial roles outweigh the regression on substantive roles.

The regression emerges at scale: larger pretrained models have stronger existing competencies that fine-tuning can disrupt. The better the base model already is at a grammar role, the more fine-tuning has to lose.

Paper

Baldwin (2026), "Valid JSON, Wrong Answer: Fine-Tuning Degrades Grammar-Role Performance at Scale Despite Improved Aggregate Loss."

License

MIT

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.3.0

Apr 12, 2026

0.2.0

Apr 5, 2026

This version

0.1.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slotloss-0.1.0.tar.gz (11.9 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

slotloss-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file slotloss-0.1.0.tar.gz.

File metadata

Download URL: slotloss-0.1.0.tar.gz
Upload date: Apr 4, 2026
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for slotloss-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6abc9c3d7165cca3e23810db118d37242eeb4c139c1a2654179231bafd8f2a15`
MD5	`770bf6dc3f91bd936e278a42068222a8`
BLAKE2b-256	`fd61dd670c06d221ebb3c22bb56623fb17a869963214ebfaa68c593efc896040`

See more details on using hashes here.

File details

Details for the file slotloss-0.1.0-py3-none-any.whl.

File metadata

Download URL: slotloss-0.1.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 11.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for slotloss-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54f9380678613518b7d5bbfc1b52643c8d92da9b7f8214ce7bd92b969c16fa51`
MD5	`8730fe1210ad644415707c3322e65551`
BLAKE2b-256	`cfb76282a51356da76da0c9dd743e7c14e8313544a90ef7f580af5a5d04c5dde`

See more details on using hashes here.

slotloss 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

slotloss

The Problem

Install

Usage

Command Line

Python API

Low-Level API

Data Format

Grammar Roles

Why Regressions Happen

Paper

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes