Evaluate agent skills via static analysis, trigger testing, and trace analysis. Run `sklab` after installing to scan your skills.
Project description
Skill Lab
Agent Skills Evaluation Framework
Your agent's skills are probably broken in at least one way — and you don't know it yet. Skill Lab catches skills that drain tokens, never fire, or leak data before they cause damage.
pip install skill-lab
Why Skill Lab
Performance — A badly-written skill can triple your token usage with zero gain. We score every skill 0–100 and show exactly what it costs. sklab evaluate ./my-skill
Security — A malicious skill can exfiltrate company data to an external endpoint. Static checks catch that before the conversation starts. sklab scan ./my-skill
Trigger Testing — If your description doesn't have enough trigger examples, the skill sits there doing nothing. We generate and run ~13 tests automatically. sklab trigger ./my-skill
Quick Start
# Install
pip install skill-lab
# First run — scans your repo and shows the getting started guide
sklab
Commands
| Command / Flag | Description |
|---|---|
| Evaluate | |
sklab evaluate ./my-skill |
Static checks + LLM quality review (0-100 scores) |
--verbose / -V |
Show all checks + LLM reasoning |
--skip-review |
Skip LLM review (static checks only) |
--model / -m <model> |
Choose LLM model for review (supports Anthropic, OpenAI, Gemini) |
--spec-only / -s |
Only run spec-required checks |
--format / -f json |
Output as JSON |
--output / -o <file> |
Write output to a file |
--all |
Evaluate every skill in the current directory |
--repo |
Evaluate every skill from the git repo root |
| Check | |
sklab check ./my-skill |
Quick pass/fail — exits 0 or 1, great for CI pipelines |
--spec-only / -s |
Only validate against the Agent Skills spec |
--all |
Validate every skill in the current directory |
--repo |
Validate every skill from the git repo root |
| Scan | |
sklab scan ./my-skill |
Security scan — shows BLOCK / SUS / ALLOW status per check |
--all |
Scan every skill in the current directory |
| Info | |
sklab info ./my-skill |
Skill metadata + token cost estimates (discovery vs activation) |
--json |
Output as JSON |
--field / -f <name> |
Extract a single field value |
| Prompt | |
sklab prompt ./skill-a |
Export skill(s) as a prompt for agent platforms |
--format / -f <fmt> |
Output format: xml (default), markdown, json |
| Stats | |
sklab stats |
Your personal usage history and score trends |
count |
Skill invocation counts for the current month |
score |
Score trend for all evaluated skills |
tokens |
Token usage per skill for the current month |
| Browse | |
sklab list-checks |
Browse all 37 checks across 5 dimensions |
--spec-only |
Only spec-required checks |
--suggestions-only |
Only quality suggestions |
Trigger Testing (requires ANTHROPIC_API_KEY) |
|
sklab generate ./my-skill |
Auto-generate ~13 trigger test cases via LLM |
--model <model-id> |
Anthropic model ID to use (e.g. claude-sonnet-4-6). The skill path is a positional argument that comes before this flag. |
--force |
Overwrite existing test file |
sklab trigger ./my-skill |
Run trigger tests against a live runtime |
--type <type> |
Filter by type: explicit, implicit, contextual, negative |
| Telemetry | |
sklab telemetry |
Show telemetry status |
enable |
Enable anonymous usage telemetry |
disable |
Disable anonymous usage telemetry |
show |
View recent events (--limit / -n N, --json) |
What Gets Checked
37 checks across 5 dimensions. Run sklab list-checks to browse all of them with severity labels.
Structure (13)
- SKILL.md Exists · Valid Frontmatter · Standard Frontmatter Fields
- Allowed Tools Format · Compatibility Length · License Format · Metadata Format
- Scripts Folder Valid · Scripts Self-Contained · Scripts No Interactive Input · Scripts Help Support
- References Folder Valid · Files Outside Spec Dirs
Naming (3)
- Name Required · Name Format (kebab-case) · Name Matches Directory
Description (3)
- Description Required · Description Not Empty · Description Max Length
Content (13)
- Body Not Empty · Has Examples · Description Actionable · Line Budget · Token Budget
- Metadata Token Budget · Reference Depth · Asset Paths Exist · Script Paths Exist
- Scripts Referenced · Compatibility Prerequisites · Broken Internal Links · Orphaned Files
Security (5)
- Prompt Injection & Jailbreak · Evaluator Manipulation · Unicode Obfuscation · YAML Anomalies · Suspicious Size & Structure
Trigger Testing
Skill Lab generates ~13 test cases per skill across 4 types — explicit, implicit, contextual, and negative — then runs them against a live LLM via Claude CLI.
Requires Claude CLI: npm install -g @anthropic-ai/claude-code
# .sklab/tests/triggers.yaml
skill: my-skill
test_cases:
# should fire
- id: explicit-1
type: explicit
prompt: "$my-skill do the thing"
expected: trigger
# should NOT fire
- id: negative-1
type: negative
prompt: "unrelated question"
expected: no_trigger
Telemetry
sklab collects anonymous usage data (command names, duration, exit codes, scores, token counts). No skill content, file paths, or flag values are ever collected. To opt out:
sklab telemetry disable
See docs/PRIVACY.md for the full privacy policy.
Development
pip install -e ".[dev]"
pytest tests/ -v
mypy src/
ruff check src/
ruff format src/
Apache License 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skill_lab-0.7.0.tar.gz.
File metadata
- Download URL: skill_lab-0.7.0.tar.gz
- Upload date:
- Size: 194.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d3875161414e292e78b16c3e3c6b5ee068bc5b116745327f2849085277fb42b
|
|
| MD5 |
564b8e24a660f883c72d8ecee573c57f
|
|
| BLAKE2b-256 |
7650347b8074da05098f3414d7de50e7af5a955fe5d9601f75736c4861467d71
|
Provenance
The following attestation bundles were made for skill_lab-0.7.0.tar.gz:
Publisher:
publish.yml on 8ddieHu0314/Skill-Lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skill_lab-0.7.0.tar.gz -
Subject digest:
3d3875161414e292e78b16c3e3c6b5ee068bc5b116745327f2849085277fb42b - Sigstore transparency entry: 1242248261
- Sigstore integration time:
-
Permalink:
8ddieHu0314/Skill-Lab@1212ae4695cb1827b50c2e01e932e74becb81a5a -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/8ddieHu0314
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1212ae4695cb1827b50c2e01e932e74becb81a5a -
Trigger Event:
release
-
Statement type:
File details
Details for the file skill_lab-0.7.0-py3-none-any.whl.
File metadata
- Download URL: skill_lab-0.7.0-py3-none-any.whl
- Upload date:
- Size: 158.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24f72d51b87a7f7f044470c81602477e62067d042b0524d421c3ea0eb05b0e99
|
|
| MD5 |
0f80d87f0f2a96e2c8c97865476bf278
|
|
| BLAKE2b-256 |
e678170dc137653bc5295617f28e396c1e2367c6eae75f3d35b7f93796c2c83a
|
Provenance
The following attestation bundles were made for skill_lab-0.7.0-py3-none-any.whl:
Publisher:
publish.yml on 8ddieHu0314/Skill-Lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skill_lab-0.7.0-py3-none-any.whl -
Subject digest:
24f72d51b87a7f7f044470c81602477e62067d042b0524d421c3ea0eb05b0e99 - Sigstore transparency entry: 1242248266
- Sigstore integration time:
-
Permalink:
8ddieHu0314/Skill-Lab@1212ae4695cb1827b50c2e01e932e74becb81a5a -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/8ddieHu0314
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1212ae4695cb1827b50c2e01e932e74becb81a5a -
Trigger Event:
release
-
Statement type: