Skip to main content

Evaluation-driven Claude Code skill development

Project description

skillet

CI License Python

Evaluation-driven Claude Code skill development.

Install

pip install pyskillet

Why

Anthropic recommends building evaluations before writing skills:

Create evaluations BEFORE writing extensive documentation. This ensures your Skill solves real problems rather than documenting imagined ones.

But they don't provide tooling:

We do not currently provide a built-in way to run these evaluations.

skillet fills that gap.

Workflow

1. Capture evals with /skillet:add

When Claude does something wrong, capture it:

> Write a code review comment for this SQL query...

Claude: This code has a SQL injection vulnerability...

> /skillet:add

Claude: What did you expect instead?

> Should start with **issue** (blocking): using conventional comments format

Claude: Eval saved to ~/.skillet/evals/conventional-comments/001.yaml

2. Run baseline eval

skillet eval conventional-comments
Eval Results (baseline, no skill)
==================================
Evals: 5
Samples: 3 per eval
Total runs: 15

Pass rate: 0% (0/15)

3. Create the skill

skillet create conventional-comments
Found 5 evals for 'conventional-comments', drafting SKILL.md...

Created ~/.claude/skills/conventional-comments/
└── SKILL.md (draft from 5 evals)

4. Eval with skill

skillet eval conventional-comments ~/.claude/skills/conventional-comments
Eval Results (with skill)
=========================
Skill: ~/.claude/skills/conventional-comments
Evals: 5
Samples: 3 per eval
Total runs: 15

Pass rate: 80% (12/15)

5. Tune the skill

skillet tune conventional-comments ~/.claude/skills/conventional-comments
Round 1/5: Pass rate 80% (12/15)
  Improving skill...
Round 2/5: Pass rate 93% (14/15)
  Improving skill...
Round 3/5: Pass rate 100% (15/15)
  Target reached!

Best skill saved to ~/.claude/skills/conventional-comments/SKILL.md

Commands

skillet eval <name> [skill]         # run evals (baseline or with skill)
skillet create <name>               # create skill from evals
skillet tune <name> <skill>         # iteratively improve skill
skillet compare <name> <skill>      # compare baseline vs skill from cache
skillet show <name>                 # inspect cached eval results
skillet lint <path>                 # lint a SKILL.md for common issues
skillet generate-evals <skill>      # generate candidate evals from a skill

Evals

Evals are stored in ~/.skillet/evals/<name>/:

# ~/.skillet/evals/conventional-comments/001.yaml
timestamp: 2025-01-15T10:30:00Z
name: conventional-comments
prompt: "Write a code review comment for..."
expected: "Should start with **issue** (blocking):"

Python API

import asyncio
from pathlib import Path
from skillet import evaluate

async def main():
    # Baseline (no skill)
    baseline = await evaluate("conventional-comments")
    print(f"Baseline: {baseline['pass_rate']}%")

    # With skill
    result = await evaluate(
        "conventional-comments",
        skill_path=Path("~/.claude/skills/conventional-comments").expanduser(),
    )
    print(f"With skill: {result['pass_rate']}%")

asyncio.run(main())

Tune a skill programmatically:

from skillet import tune
from skillet.tune import TuneConfig

result = await tune(
    "conventional-comments",
    Path("~/.claude/skills/conventional-comments").expanduser(),
    config=TuneConfig(max_rounds=10, target_pass_rate=90.0),
)
print(f"Final pass rate: {result.result.final_pass_rate}%")

See the Python API reference for all functions and options.

Documentation

Full documentation available at the docs site:

Roadmap

See ROADMAP.md for future ideas and planned features.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyskillet-0.2.29.tar.gz (487.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyskillet-0.2.29-py3-none-any.whl (202.0 kB view details)

Uploaded Python 3

File details

Details for the file pyskillet-0.2.29.tar.gz.

File metadata

  • Download URL: pyskillet-0.2.29.tar.gz
  • Upload date:
  • Size: 487.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyskillet-0.2.29.tar.gz
Algorithm Hash digest
SHA256 0772682892b1359ec95c5a818a847bff26237ac0111475b046e8e06d6d30bec5
MD5 e4e6e7cecaed4c3214e2ab3d95452a46
BLAKE2b-256 73d0cb6677d4e77abc92287c8e7e53198b61efe27cb9faa5276a91dc36e83f28

See more details on using hashes here.

File details

Details for the file pyskillet-0.2.29-py3-none-any.whl.

File metadata

  • Download URL: pyskillet-0.2.29-py3-none-any.whl
  • Upload date:
  • Size: 202.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyskillet-0.2.29-py3-none-any.whl
Algorithm Hash digest
SHA256 c7a707f7c88cb2f50347c3b3f0068e437d28f0a6fbd5c67c2b577b427cccd2a7
MD5 015a43bb3d1313798f3e021b5687dd45
BLAKE2b-256 3007d7b0e2b6d13c4bdaa77a6a7c17c325ee1cf8566fe15e4836cac743a77a28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page