Deterministic quality scorer for AI agent instruction files. Multi-format (SKILL.md, CLAUDE.md, .cursorrules, AGENTS.md), 8-dimension scoring with security, anti-gaming detection, zero dependencies.
Project description
Schliff
Your AI instruction files silently degrade — and nothing catches it. A trigger phrase rots. An edge case slips. Your SKILL.md balloons past its token budget. No error, no red test — just an agent that quietly gets worse.
A deterministic quality scorer for AI instruction files. Same input, same score — every time, on every machine. Think the Ruff for SKILL.md, CLAUDE.md, and AGENTS.md. It measures the things linters miss, the same way every time, so degradation shows up as a number that drops instead of a bug you chase.
Schliff scores the instruction files that drive your AI agents — skills, system prompts, project memory — against an explicit, versioned rubric. No LLM judge in the critical path. No network. No randomness. Just a rule engine you can read, pin, and trust in CI.
pip install schliff
schliff score path/to/SKILL.md
schliff v8.1.0
structure ████████░░ 78/100 good
triggers ███████░░░ 72/100 good
quality ██████░░░░ 64/100 fair
edges █████░░░░░ 55/100 fair
efficiency ████████░░ 80/100 good
composability ███████░░░ 70/100 good
clarity ██████████ 100/100 perfect
Structural Score ██████████████░░░░░░ 71.2/100 [C]
Tokens: 740 / 1,000 (ok)
No model in the loop produced that number. Run it again on another laptop and you get 71.2 again. That is the whole point.
Why deterministic?
Most "AI quality" tools ask another LLM to grade your prompt. That makes the score non-reproducible (re-run it, get a different number), un-auditable (the rubric lives in a hidden prompt), and trivially gameable (write for the judge, not the user). A score you can't reproduce isn't a measurement — it's a vibe. You can't gate a release on a number that drifts.
Schliff takes the opposite position:
- Reproducible. The headline composite is computed from a canonical, versioned weight registry. Calibration is off by default, so
verify,badge, and the leaderboard return the same score on your laptop and in CI. - Auditable. Every dimension is a readable scorer in
scripts/scoring/. The weights are a dict you can open. There is no hidden judge prompt. - Anti-gaming by design. A dedicated guard layer (
guards.py) plus per-scorer heuristics detect padding, keyword stuffing, and structure-mimicry instead of rewarding them. - Zero core dependencies. Core Schliff is stdlib-only and runs on Python ≥ 3.9. (Optional
[evolve]/[judge]extras pull in LLM clients for an opt-in smoke-test only — never for scoring.)
Because the number is stable, it does real work:
- Diff it across two commits to see exactly what a refactor cost or earned.
- Gate a pull request on a minimum score, with a non-zero exit code below the line.
- Compare two files side by side on the same rubric.
An optional LLM judge exists for exploratory work, but it is never part of the deterministic score. The number you gate on is rule-based, end to end.
The 8 scored dimensions
For the SKILL.md family, Schliff runs 8 scorers per file. 7 of them form the headline composite; security and runtime are reported as separate opt-in signals so a security warning never silently inflates or deflates your quality grade.
| Dimension | Weight | In headline? |
|---|---|---|
structure |
0.15 | ✅ |
triggers |
0.20 | ✅ |
quality |
0.20 | ✅ |
edges |
0.15 | ✅ |
efficiency |
0.10 | ✅ |
composability |
0.10 | ✅ |
clarity |
0.05 | ✅ |
security |
0.05 | Separate signal (gate threshold 70) |
runtime |
— | Separate signal (no profile weight) |
The seven headline weights are renormalized to sum to 1.0 — that is the canonical basis.
Note:
securityis a side signal for theSKILL.md/CLAUDE.md/.cursorrules/AGENTS.mdfamily, but a core 0.15 headline dimension for thesystem_promptformat, which uses its own scorer set. Onlyruntimeis excluded everywhere.
The composite: a full-denominator model
Schliff does not quietly renormalize across whatever you happened to measure. Unmeasured dimensions contribute 0 and stay in the denominator — so coverage gaps lower your ceiling instead of quietly disappearing. Your score ceiling equals your measurement coverage. Measure 4 of the 7 headline dimensions and your maximum possible score is capped accordingly, with an explicit warning:
ℹ Scored 4/7 dimensions — the score can't exceed 42% until the rest
are measured. Run /schliff:init to add an eval suite and score:
triggers, quality, edges.
This is deliberate. A partial measurement is an honest partial score, never a flattering one. Unmeasured work is missing points, not invisible. To lift the ceiling, measure more — don't hide the gap.
Grade scale
S ≥ 95 · A ≥ 85 · B ≥ 75 · C ≥ 65 · D ≥ 50 · E ≥ 35 · F < 35
Multi-format support
One engine, five instruction-file formats — each with its own token budget and scorer set:
| Format | Token budget | Scorers |
|---|---|---|
SKILL.md |
1,000 | shared 8-scorer registry |
CLAUDE.md |
2,000 | shared 8-scorer registry |
.cursorrules |
500 | shared 8-scorer registry |
AGENTS.md |
3,000 | shared 8-scorer registry |
| system prompts | 1,500 | dedicated set (structure_prompt, output_contract, efficiency, clarity, security, composability, completeness) |
Format is auto-detected; override with --format (skill, claude, cursor, agents, system-prompt).
Install
pip install schliff # core, stdlib-only
pip install "schliff[evolve,judge]" # optional LLM-judge / evolve extras
| Install | Pulls in | When you need it |
|---|---|---|
schliff |
stdlib only | Scoring, verify, badge, CI — everything that gates a release |
schliff[judge] |
LLM client | Opt-in exploratory LLM-judge smoke-test (never scoring) |
schliff[evolve] |
LLM client | Opt-in autonomous-improvement extras |
GitHub Action
Gate pull requests on instruction-file quality:
# .github/workflows/schliff.yml
name: schliff
on: [pull_request]
jobs:
score:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install schliff
- run: schliff verify path/to/SKILL.md --min-score 75
schliff verify exits non-zero below the threshold — a clean CI gate.
pre-commit
# .pre-commit-config.yaml
repos:
- repo: https://github.com/Zandereins/schliff
rev: v8.1.0
hooks:
- id: schliff-verify
args: ['--min-score', '75']
CLI
schliff <command> [path] [options]
| Command | What it does |
|---|---|
score |
Score a file and print the grade bar |
verify |
CI gate — exit 0/1 based on a minimum score |
doctor |
Scan and grade every installed skill |
badge |
Generate a Markdown score badge |
diff |
Explain score changes between two git commits |
compare |
Compare two files side by side |
suggest |
Rank fixes by estimated score impact |
report |
Generate a Markdown score report |
demo |
Score a built-in bad skill to see Schliff in action |
evolve |
Improve an instruction file's score |
version |
Print the version |
The version is single-sourced: the CLI resolves it at runtime via importlib.metadata.version("schliff"), falling back to dev from a source checkout.
The autonomous improvement loop
Schliff doesn't just grade — it can close the loop. The improvement engine measures first, then fixes (not the other way around):
- Score the file across all dimensions.
- Generate deterministic patch gradients for the weakest dimensions.
- Apply the safe, rule-based patches automatically — ~32% of suggested fixes apply deterministically through the apply gate (confidence=high, single-edit; canonical measurement:
measure_patch_ratio.py). The rest are handed to an optional LLM. - Re-score and keep the change only if the score improved — otherwise revert.
- Stop on plateau detection or when the target is reached.
It also carries cross-session episodic memory (episodic_store.py), so improvement runs learn from prior attempts instead of repeating them. Drive it from Claude Code with /schliff:auto, or use schliff evolve directly.
→ 7 deterministic fixes available. Run `/schliff:auto` to apply.
How it works
The full methodology — scorer internals, the full-denominator composite, the anti-gaming guards, and the calibration model — lives in docs/SCORING.md. Calibration is strictly opt-in: ambient auto-calibrated weights apply only when SCHLIFF_CALIBRATED_WEIGHTS is set and only for the interactive score command, and Schliff emits a weight_source=calibrated warning flagging that such scores are not comparable to the canonical scale. Everything that gates a release stays canonical.
scripts/
├── cli.py # CLI entrypoint + dynamic version resolution
├── scoring/
│ ├── registry.py # canonical weights, scorer lists, headline exclusions
│ ├── composite.py # full-denominator composite model
│ ├── formats.py # format detection + token budgets
│ ├── guards.py # anti-gaming detection
│ └── structure.py · triggers.py · quality.py · edges.py · …
├── text_gradient.py # deterministic patch gradients (apply gate)
├── episodic_store.py # cross-session episodic memory
└── measure_patch_ratio.py # canonical source for the patch-ratio claim
Positioning
LLM-judge tools ask a model how good your prompt feels — a different answer every run. Schliff computes how good it measurably is — the same answer every run, in a number you can pin to a commit and gate a release on.
Ruff lints your Python. Biome lints your JS. Schliff lints the instruction files that drive your AI — deterministically, with no model in the loop.
Contributing & links
- ⭐ Star the repo: github.com/Zandereins/schliff
- 📖 Docs:
docs/SCORING.md - 🏆 Leaderboard: github.com/Zandereins/schliff
- 🧪 Playground:
schliff demo
Validated by 1,198 tests (unit + integration) in skills/schliff/tests, with separate self and proof suites via test-self.sh and test-integration.sh.
License
MIT © Franz Paul
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schliff-8.1.0.tar.gz.
File metadata
- Download URL: schliff-8.1.0.tar.gz
- Upload date:
- Size: 226.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3880babe6c6c3029f7c177c4fa64ec30cd37f7923a96756263c662bb26354c69
|
|
| MD5 |
b623c0319cf48b0a143ad6e0df16856b
|
|
| BLAKE2b-256 |
0991833bb73df3b21f10463d8b2cf84c956991bee4b7e2abc124ba9eea8f3792
|
Provenance
The following attestation bundles were made for schliff-8.1.0.tar.gz:
Publisher:
publish.yml on Zandereins/schliff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schliff-8.1.0.tar.gz -
Subject digest:
3880babe6c6c3029f7c177c4fa64ec30cd37f7923a96756263c662bb26354c69 - Sigstore transparency entry: 1708516532
- Sigstore integration time:
-
Permalink:
Zandereins/schliff@f650644f3ed5592676cca939446c1bb2ba0b3691 -
Branch / Tag:
refs/tags/v8.1.0 - Owner: https://github.com/Zandereins
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f650644f3ed5592676cca939446c1bb2ba0b3691 -
Trigger Event:
release
-
Statement type:
File details
Details for the file schliff-8.1.0-py3-none-any.whl.
File metadata
- Download URL: schliff-8.1.0-py3-none-any.whl
- Upload date:
- Size: 262.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00af422319236316eacf505f8fdd89c7abe7720dddc8bd17eb44ff96851c2ba7
|
|
| MD5 |
0fdb584006cc4441e340e4bd3fab37eb
|
|
| BLAKE2b-256 |
8748e07a490f3a4b5ceedddc0f0f61c0e68e7d49af9dd0ebaa1f9a38f8932ca0
|
Provenance
The following attestation bundles were made for schliff-8.1.0-py3-none-any.whl:
Publisher:
publish.yml on Zandereins/schliff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schliff-8.1.0-py3-none-any.whl -
Subject digest:
00af422319236316eacf505f8fdd89c7abe7720dddc8bd17eb44ff96851c2ba7 - Sigstore transparency entry: 1708516578
- Sigstore integration time:
-
Permalink:
Zandereins/schliff@f650644f3ed5592676cca939446c1bb2ba0b3691 -
Branch / Tag:
refs/tags/v8.1.0 - Owner: https://github.com/Zandereins
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f650644f3ed5592676cca939446c1bb2ba0b3691 -
Trigger Event:
release
-
Statement type: