Evaluation-driven Claude Code skill development

Project description

skillet

Evaluation-driven Claude Code skill development.

Install

pip install pyskillet

Why

Anthropic recommends building evaluations before writing skills:

Create evaluations BEFORE writing extensive documentation. This ensures your Skill solves real problems rather than documenting imagined ones.

But they don't provide tooling:

We do not currently provide a built-in way to run these evaluations.

skillet fills that gap.

Workflow

1. Capture evals with `/skillet:add`

When Claude does something wrong, capture it:

> Write a code review comment for this SQL query...

Claude: This code has a SQL injection vulnerability...

> /skillet:add

Claude: What did you expect instead?

> Should start with **issue** (blocking): using conventional comments format

Claude: Eval saved to ~/.skillet/evals/conventional-comments/001.yaml

2. Run baseline eval

skillet eval conventional-comments

Eval Results (baseline, no skill)
==================================
Evals: 5
Samples: 3 per eval
Total runs: 15

Pass rate: 0% (0/15)

3. Create the skill

skillet create conventional-comments

Found 5 evals for 'conventional-comments', drafting SKILL.md...

Created ~/.claude/skills/conventional-comments/
└── SKILL.md (draft from 5 evals)

4. Eval with skill

skillet eval conventional-comments ~/.claude/skills/conventional-comments

Eval Results (with skill)
=========================
Skill: ~/.claude/skills/conventional-comments
Evals: 5
Samples: 3 per eval
Total runs: 15

Pass rate: 80% (12/15)

5. Tune the skill

skillet tune conventional-comments ~/.claude/skills/conventional-comments

Round 1/5: Pass rate 80% (12/15)
  Improving skill...
Round 2/5: Pass rate 93% (14/15)
  Improving skill...
Round 3/5: Pass rate 100% (15/15)
  Target reached!

Best skill saved to ~/.claude/skills/conventional-comments/SKILL.md

Commands

skillet eval <name> [skill]         # run evals (baseline or with skill)
skillet create <name>               # create skill from evals
skillet tune <name> <skill>         # iteratively improve skill
skillet compare <name> <skill>      # compare baseline vs skill from cache
skillet show <name>                 # inspect cached eval results
skillet lint <path>                 # lint a SKILL.md for common issues
skillet generate-evals <skill>      # generate candidate evals from a skill

Evals

Evals are stored in ~/.skillet/evals/<name>/:

# ~/.skillet/evals/conventional-comments/001.yaml
timestamp: 2025-01-15T10:30:00Z
name: conventional-comments
prompt: "Write a code review comment for..."
expected: "Should start with **issue** (blocking):"

Python API

import asyncio
from pathlib import Path
from skillet import evaluate

async def main():
    # Baseline (no skill)
    baseline = await evaluate("conventional-comments")
    print(f"Baseline: {baseline['pass_rate']}%")

    # With skill
    result = await evaluate(
        "conventional-comments",
        skill_path=Path("~/.claude/skills/conventional-comments").expanduser(),
    )
    print(f"With skill: {result['pass_rate']}%")

asyncio.run(main())

Tune a skill programmatically:

from skillet import tune
from skillet.tune import TuneConfig

result = await tune(
    "conventional-comments",
    Path("~/.claude/skills/conventional-comments").expanduser(),
    config=TuneConfig(max_rounds=10, target_pass_rate=90.0),
)
print(f"Final pass rate: {result.result.final_pass_rate}%")

See the Python API reference for all functions and options.

Documentation

Full documentation available at the docs site:

Getting Started
Concepts — Skills vs agents, capability vs regression, balanced problem sets
CLI Reference
Eval Format
Python API

Roadmap

See ROADMAP.md for future ideas and planned features.

License

MIT

Project details

Release history Release notifications | RSS feed

0.2.30

Apr 25, 2026

0.2.29

Apr 25, 2026

This version

0.2.28

Feb 16, 2026

0.2.27

Feb 16, 2026

0.2.26

Feb 16, 2026

0.2.25

Feb 16, 2026

0.2.24

Feb 16, 2026

0.2.23

Feb 16, 2026

0.2.22

Feb 16, 2026

0.2.21

Feb 15, 2026

0.2.20

Feb 13, 2026

0.2.19

Feb 12, 2026

0.2.18

Feb 12, 2026

0.2.17

Feb 12, 2026

0.2.16

Feb 11, 2026

0.2.15

Feb 10, 2026

0.2.14

Feb 9, 2026

0.2.13

Jan 30, 2026

0.2.12

Jan 28, 2026

0.2.11

Jan 27, 2026

0.2.10

Jan 26, 2026

0.2.9

Jan 25, 2026

0.2.8

Jan 24, 2026

0.2.7

Jan 23, 2026

0.2.6

Jan 5, 2026

0.2.5

Dec 23, 2025

0.2.4

Dec 20, 2025

0.2.3

Dec 19, 2025

0.2.2

Dec 18, 2025

0.2.1

Dec 17, 2025

0.2.0

Dec 15, 2025

0.1.2

Dec 14, 2025

0.0.1

Dec 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyskillet-0.2.28.tar.gz (485.9 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyskillet-0.2.28-py3-none-any.whl (202.0 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file pyskillet-0.2.28.tar.gz.

File metadata

Download URL: pyskillet-0.2.28.tar.gz
Upload date: Feb 16, 2026
Size: 485.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyskillet-0.2.28.tar.gz
Algorithm	Hash digest
SHA256	`1ca4bd138366d8fde33859d597dde69ebd7337273f5fe3a116cf3cbbb02d9e69`
MD5	`a13ced1cb48bfc2615d4048ec0985556`
BLAKE2b-256	`fe48adde082741fe2a68b9430a9f437a208943b835d57204187187e2a514e456`

See more details on using hashes here.

File details

Details for the file pyskillet-0.2.28-py3-none-any.whl.

File metadata

Download URL: pyskillet-0.2.28-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 202.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyskillet-0.2.28-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05eadd1764f4c1796e68bb849fcd055c4b67d940b965fba8b6b6aa26b452f2c9`
MD5	`5476069f7832e2ed5de8b9cc1b0691cc`
BLAKE2b-256	`35cf3e19bffe7623fdfb57897e1ef64556a3dd01e7b7ea27e5d5e52d219a137f`

See more details on using hashes here.

pyskillet 0.2.28

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

skillet

Install

Why

Workflow

1. Capture evals with `/skillet:add`

2. Run baseline eval

3. Create the skill

4. Eval with skill

5. Tune the skill

Commands

Evals

Python API

Documentation

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

pyskillet 0.2.28

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

skillet

Install

Why

Workflow

1. Capture evals with /skillet:add

2. Run baseline eval

3. Create the skill

4. Eval with skill

5. Tune the skill

Commands

Evals

Python API

Documentation

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Capture evals with `/skillet:add`