Autonomous skill improvement and measurement framework for Claude Code

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FPaolo

These details have not been verified by PyPI

Project description

Schliff

The finishing cut for Claude Code skills.

Schliff improving a skill from 56.9 to 99.9

Baseline:  █████░░░░░░░░░░░░░░░  54.0/100  [D]
After 18x: ████████████████████  98.3/100  [S]

What changed:
  Structure         70 → 100     Added description, examples, concrete commands
  Efficiency        35 → 93      Removed hedging language, improved density
  Composability     30 → 90      Added scope, error behavior, dependencies
  Clarity           90 → 100     Resolved vague references

You wrote a skill. It worked. Three weeks later, triggers misfire, edge cases slip through, instructions contradict themselves. Schliff measures the damage (deterministic scoring, no LLM needed) and fixes it autonomously (Claude Code applies patches, measures delta, reverts regressions).

Try It — Demo in 3 minutes

Note: Schliff commands (/schliff:*) run inside Claude Code, not in a regular terminal. Claude's intelligence decides which patches to apply — the scorer is deterministic, the improvement loop uses the LLM.

# 1. Install once (terminal, ~1 min)
git clone https://github.com/Zandereins/schliff.git && bash schliff/install.sh

# 2. Score the included demo skill (Claude Code, ~10 sec)
/schliff:init demo/bad-skill/SKILL.md

# 3. Watch it improve the demo skill (Claude Code, ~2 min)
/schliff:auto

What you'll see on the demo skill: 18 autonomous iterations. Each one: patch → measure → keep or revert. Score climbs from 54 [D] to 98 [S]. Stops when ROI plateaus. Real-world skills take longer and may not reach [S] — complex skills plateau around [A] to [S] depending on their eval suite coverage.

Prerequisites: Python 3.9+, Bash, Git, jq

Already have skills? Run /schliff:doctor to scan all installed skills and show health grades + token costs.

What Schliff Fixes

Real improvements from the included demo skill:

Problem	What Schliff does	Result
Triggers misfire	Keyword matching + negative boundaries	0% → 89% accuracy
Missing structure	Added examples, edge cases, frontmatter	75 → 100/100
Vague instructions	Replaced hedging with concrete commands	35 → 93/100
No scope boundaries	Added handoff declarations + "do NOT use"	40 → 100/100

Automated. No human intervention. Stops when ROI plateaus.

This Is For You If

Skill Creator — Run /schliff:init on your v1 skill to get a baseline + eval suite
Skill Maintainer — Run /schliff:auto to grind any skill from [C] to [S] overnight
Fleet Manager (10+ skills) — Run /schliff:doctor to scan everything, detect conflicts + token costs
Quality Gate — Run /schliff:eval before shipping, or use the GitHub Action in CI

Why It Works

Autonomous — Runs unattended. Applies patches, measures delta, reverts regressions, stops when ROI drops. No prompts, no babysitting.

Deterministic scoring — The 7-dimension scorer is pure Python, no LLM. Same input, same output. The improvement loop (/schliff:auto) runs inside Claude Code — Claude decides which patches to apply, but 60-70% of fixes follow deterministic rules (frontmatter, noise removal, TODO cleanup).

Empirical — 7 scoring dimensions (structure, triggers, quality, edges, efficiency, composability, clarity) + optional runtime validation against actual Claude behavior.

Learns — Episodic memory remembers which strategies worked across sessions. Predicts success before trying. Your 50th skill improves faster than your 1st.

Scales — MinHash + LSH mesh analysis detects trigger conflicts across 50+ skills in O(n). Doctor command shows health grades for your entire skill collection.

Autoresearch for Claude Code

Inspired by Karpathy's autoresearch (50K+ stars) — Schliff applies the same autonomous improvement loop to Claude Code skills:

	Karpathy's autoresearch	Schliff
Target	ML training scripts	Claude Code SKILL.md files
Metric	1 (val_bpb)	7 dimensions
Patches	100% LLM	60-70% deterministic
Memory	None	Cross-session episodic store
Fleet	1 file	50+ skills (Doctor + Mesh)

Both run overnight. Both stop when ROI plateaus. Both improve unattended.

Commands

Core

Command	What It Does
`/schliff`	Full autonomous loop with GOAL + METRIC
`/schliff:doctor`	Scan ALL installed skills, show health summary
`/schliff:auto`	Self-driving auto-improve (deterministic patches, no prompts)
`/schliff:init`	Bootstrap eval suite + baseline from any SKILL.md
`/schliff:report`	Generate shareable markdown report with badge

Analyze & Debug

Command	What It Does
`/schliff:analyze`	One-shot gap analysis with ranked recommendations
`/schliff:bench`	Establish quality baseline for a skill
`/schliff:eval`	Run eval suite assertions
`/schliff:mesh`	Detect trigger conflicts across all installed skills
`/schliff:triage`	Cluster failures, auto-generate fixes
`/schliff:log-failure`	Log a skill failure for later triage
`/schliff:update`	Update Schliff to latest version

How It Scores — 7 dimensions + optional runtime

Two modes, one decision:

Structural Score (default) — Instant, zero LLM cost. Pure Python analysis of file organization, trigger keywords, eval coverage, edge cases, efficiency, composability. No API calls needed. Use schliff score SKILL.md from any terminal or /schliff:bench in Claude Code.

Runtime Score (--runtime) — Invokes Claude with test prompts, validates actual behavior against assertions. Requires Claude CLI. Use before shipping to production.

Improvement Loop (/schliff:auto) — Runs inside Claude Code. Claude reads the scorer output, picks the highest-impact fix, patches the SKILL.md, re-scores, keeps or reverts. This is where the LLM intelligence lives. The scorer is the ruler; Claude is the craftsman.

Dimension	Weight	What It Measures
Structure	15%	Frontmatter, headers, examples, progressive disclosure
Trigger Accuracy	20%	TF-IDF keyword overlap against eval suite prompts
Eval Coverage	20%	Assertion breadth and eval suite coverage
Edge Coverage	15%	Edge case definitions in eval suite
Token Efficiency	10%	Information density, signal-to-noise ratio
Composability	10%	Scope boundaries, handoff declarations
Clarity	5%	Contradiction detection, vague references, ambiguity
Runtime (opt-in)	10%	Actual Claude behavior against assertions

Grades: S (>=95), A (>=85), B (>=75), C (>=65), D (>=50), E (>=35), F (<35).

Full scoring methodology: docs/SCORING.md

Dashboard — Health overview for any skill

======================================================================
  Schliff Health Dashboard: schliff
======================================================================

  Structural Score: ███████████████████░  95.4/100  [S]
    [7/8 dimensions, 90% coverage]

  Dimensions:
    structure       ██████████  100/100
    triggers        █████████░   95/100
    quality         █████████░   91/100
    edges           ██████████  100/100
    efficiency      ████████░░   84/100
    composability   ██████████  100/100
    clarity         ██████████  100/100
======================================================================

Auto-Improve — Autonomous grinding with EMA-based stopping

Scoring baseline...
Baseline: 95.4/100 (7 dims)

--- Iteration 1 ---
Stopping: composite >= 98 (95.4)

  Schliff Auto-Improve Complete
  ──────────────────────────────────────────────────
  Score:  95 → 95.4/100  ███████████████████░  (+0.0)  [S]
  Iters:  0  |  Kept: 0  |  Time: 1s
  Stop:   composite >= 98 (95.4)
  (Already near-optimal — consider runtime eval for further gains)

Doctor — Scan all installed skills at once

======================================================================
  Schliff Doctor — Skill Health Check
======================================================================

  1 skills scanned | 1 healthy | 4 mesh issues

  Skill                      Score  Grade   Dims  Issues  Action
  --------------------------------------------------------------------
  schliff                   90    [A]    7/8       0  Healthy

  Mesh Health: 68/100 (4 cross-skill issues)
  Run /schliff:mesh for details.

  NOTE: Scores are STRUCTURAL — they measure file organization,
  not runtime effectiveness. Use --runtime for validated scoring.
======================================================================

What's New in v6.0

Feature	Description
Rebrand to Schliff	"The finishing cut" — German for polish/grind
Clarity as Default	7th dimension always active (contradictions, vague refs, ambiguity)
Token Cost Estimation	Doctor shows per-skill token cost + fleet total
GitHub Action	`Zandereins/schliff@v6` — CI quality gate with PR comments
pip CLI	`schliff score SKILL.md` — works without Claude Code
Actionable Doctor	Copy-paste commands with full skill paths
Trigger Confidence	Small eval suites (<8 triggers) capped at score 60
Context-aware Contradictions	"run tests" vs "run tests in production" distinguished
Anti-gaming	Empty headers, repetitive markers, binary composability fixed
443 Tests (unit + integration + proof)	+70 stress tests, +28 edge cases, +76 patterns, +20 golden files
40 Security Fixes	Shell injection, prompt injection, ReDoS, supply chain

Quality & Security

Schliff scores itself — 7 dimensions, same engine, no exceptions.

Metric	Value	What This Means
Structural Score	95.4 / 100 [S]	Production-ready. 10 composability sub-checks, all passing.
Tests	443 passing	318 unit + 99 integration + 20 self + 6 proof. Every scorer rule tested.
Security	40 fixes	Shell injection, prompt injection, ReDoS, supply chain.
Dimensions	7 + runtime	Transparent, rule-based, explainable scoring.
Journey	v1.0 (62.5) → v6.0 (95.4)	7 major versions. Continuous improvement, no regressions.

Scoring methodology | Security details

GitHub Action

Score skills in CI. Block PRs that regress. The Codecov for SKILL.md files.

- uses: Zandereins/schliff@v6
  with:
    skill-path: '.claude/skills/my-skill/SKILL.md'
    minimum-score: '75'      # blocks PR if below
    comment-on-pr: 'true'    # posts score table on PR

CLI

Score any skill without Claude Code:

pip install schliff

schliff score path/to/SKILL.md          # score a skill
schliff score path/to/SKILL.md --json   # JSON output
schliff doctor                           # scan all installed skills

Ecosystem

skill-creator builds a v1 skill. Schliff grinds it to production quality.

skill-creator → v1 SKILL.md → /schliff:auto → autonomous grinding → ship

skill-creator — generate the first draft
autoresearch — generalized autonomous research for Claude Code

Badge

Score your skill and add this to your README:

[![Schliff: 95 [S]](https://img.shields.io/badge/Schliff-95%2F100_%5BS%5D-brightgreen)](https://github.com/Zandereins/schliff)

Contributing

Found a bug in the scorer? Add a test case to eval-suite.json and open an issue. Want to improve scoring logic? Edit score-skill.py, run bash test-integration.sh, and PR the diff.

Next Steps

Try the 3-minute demo — see a skill go from [D] to [S]
Run /schliff:doctor on your own skills — instant health check
Add the GitHub Action to your CI — quality gate for every PR
Read the scoring methodology — understand what each dimension measures

Questions? Open an issue — we respond fast.

License

MIT — do whatever you want.

Built by Franz Paul with Claude Code.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FPaolo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

7.2.0

Apr 24, 2026

7.1.1

Apr 18, 2026

7.1.0

Mar 27, 2026

7.0.0

Mar 26, 2026

6.3.0

Mar 26, 2026

6.2.0

Mar 25, 2026

6.1.0

Mar 24, 2026

This version

6.0.1

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schliff-6.0.1.tar.gz (187.0 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schliff-6.0.1-py3-none-any.whl (217.4 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file schliff-6.0.1.tar.gz.

File metadata

Download URL: schliff-6.0.1.tar.gz
Upload date: Mar 24, 2026
Size: 187.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schliff-6.0.1.tar.gz
Algorithm	Hash digest
SHA256	`c8398a198680cd403fbce03cff5a760e1470e5f5bd1e23bec9dde404c504fb34`
MD5	`1391e97eeb4d86275047bccba473f845`
BLAKE2b-256	`85e7b400efa19acd0a999b450fe82adbcca8fa165bbf8cd09c791b1ef5796fc7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for schliff-6.0.1.tar.gz:

Publisher: publish.yml on Zandereins/schliff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: schliff-6.0.1.tar.gz
- Subject digest: c8398a198680cd403fbce03cff5a760e1470e5f5bd1e23bec9dde404c504fb34
- Sigstore transparency entry: 1173991109
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: Zandereins/schliff@e1179ad004718fda280df2d765ca41215affc43c
- Branch / Tag: refs/tags/v6.0.1
- Owner: https://github.com/Zandereins
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e1179ad004718fda280df2d765ca41215affc43c
- Trigger Event: release

File details

Details for the file schliff-6.0.1-py3-none-any.whl.

File metadata

Download URL: schliff-6.0.1-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 217.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schliff-6.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ab15672fb3f3140f3185778db87f66d0c47d0bdefacd7acf4e6332d7f057472`
MD5	`71e122d21a47e2301904d23a687272e2`
BLAKE2b-256	`3e8067c7af50fe2b2faa26eb0ce3e11c9c763f0a8f2576ebc723680de84e17c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for schliff-6.0.1-py3-none-any.whl:

Publisher: publish.yml on Zandereins/schliff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: schliff-6.0.1-py3-none-any.whl
- Subject digest: 4ab15672fb3f3140f3185778db87f66d0c47d0bdefacd7acf4e6332d7f057472
- Sigstore transparency entry: 1173991151
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: Zandereins/schliff@e1179ad004718fda280df2d765ca41215affc43c
- Branch / Tag: refs/tags/v6.0.1
- Owner: https://github.com/Zandereins
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e1179ad004718fda280df2d765ca41215affc43c
- Trigger Event: release

schliff 6.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Schliff

Try It — Demo in 3 minutes

What Schliff Fixes

This Is For You If

Why It Works

Autoresearch for Claude Code

Commands

Core

Analyze & Debug

Quality & Security

GitHub Action

CLI

Ecosystem

Badge

Contributing

Next Steps

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance