Skip to main content

Evaluate agent skills via static analysis, trigger testing, and trace analysis. Run `sklab` after installing to scan your skills.

Project description

Skill Lab

PyPI version Python 3.10+ License: Apache 2.0

Agent Skills Evaluation Framework

Your agent's skills are probably broken in at least one way — and you don't know it yet. Skill Lab catches skills that drain tokens, never fire, or leak data before they cause damage.

pip install skill-lab

Why Skill Lab

Performance — A badly-written skill can triple your token usage with zero gain. We score every skill 0–100 and show exactly what it costs. sklab evaluate ./my-skill

Security — A malicious skill can exfiltrate company data to an external endpoint. Static checks catch that before the conversation starts. sklab scan ./my-skill

Trigger Testing — If your description doesn't have enough trigger examples, the skill sits there doing nothing. We generate and run ~13 tests automatically. sklab trigger ./my-skill


Quick Start

# Install
pip install skill-lab

# First run — scans your repo and shows the getting started guide
sklab

Commands

Command / Flag Description
Evaluate
sklab evaluate ./my-skill Static checks + LLM quality review (0-100 scores)
--verbose / -V Show all checks + LLM reasoning
--skip-review Skip LLM review (static checks only)
--model / -m <model> Choose LLM model for review (supports Anthropic, OpenAI, Gemini)
--spec-only / -s Only run spec-required checks
--format / -f json Output as JSON
--output / -o <file> Write output to a file
--all Evaluate every skill in the current directory
--repo Evaluate every skill from the git repo root
Check
sklab check ./my-skill Quick pass/fail — exits 0 or 1, great for CI pipelines
--spec-only / -s Only validate against the Agent Skills spec
--all Validate every skill in the current directory
--repo Validate every skill from the git repo root
Scan
sklab scan ./my-skill Security scan — shows BLOCK / SUS / ALLOW status per check
--all Scan every skill in the current directory
Info
sklab info ./my-skill Skill metadata + token cost estimates (discovery vs activation)
--json Output as JSON
--field / -f <name> Extract a single field value
Prompt
sklab prompt ./skill-a Export skill(s) as a prompt for agent platforms
--format / -f <fmt> Output format: xml (default), markdown, json
Stats
sklab stats Your personal usage history and score trends
count Skill invocation counts for the current month
score Score trend for all evaluated skills
tokens Token usage per skill for the current month
Browse
sklab list-checks Browse all 37 checks across 5 dimensions
--spec-only Only spec-required checks
--suggestions-only Only quality suggestions
Trigger Testing (requires ANTHROPIC_API_KEY)
sklab generate ./my-skill Auto-generate ~13 trigger test cases via LLM
--model <model-id> Anthropic model ID to use (e.g. claude-sonnet-4-6). The skill path is a positional argument that comes before this flag.
--force Overwrite existing test file
sklab trigger ./my-skill Run trigger tests against a live runtime
--type <type> Filter by type: explicit, implicit, contextual, negative
Telemetry
sklab telemetry Show telemetry status
enable Enable anonymous usage telemetry
disable Disable anonymous usage telemetry
show View recent events (--limit / -n N, --json)

What Gets Checked

37 checks across 5 dimensions. Run sklab list-checks to browse all of them with severity labels.

Structure (13)

  • SKILL.md Exists · Valid Frontmatter · Standard Frontmatter Fields
  • Allowed Tools Format · Compatibility Length · License Format · Metadata Format
  • Scripts Folder Valid · Scripts Self-Contained · Scripts No Interactive Input · Scripts Help Support
  • References Folder Valid · Files Outside Spec Dirs

Naming (3)

  • Name Required · Name Format (kebab-case) · Name Matches Directory

Description (3)

  • Description Required · Description Not Empty · Description Max Length

Content (13)

  • Body Not Empty · Has Examples · Description Actionable · Line Budget · Token Budget
  • Metadata Token Budget · Reference Depth · Asset Paths Exist · Script Paths Exist
  • Scripts Referenced · Compatibility Prerequisites · Broken Internal Links · Orphaned Files

Security (5)

  • Prompt Injection & Jailbreak · Evaluator Manipulation · Unicode Obfuscation · YAML Anomalies · Suspicious Size & Structure

Trigger Testing

Skill Lab generates ~13 test cases per skill across 4 types — explicit, implicit, contextual, and negative — then runs them against a live LLM via Claude CLI.

Requires Claude CLI: npm install -g @anthropic-ai/claude-code

# .sklab/tests/triggers.yaml
skill: my-skill
test_cases:
  # should fire
  - id: explicit-1
    type: explicit
    prompt: "$my-skill do the thing"
    expected: trigger
  # should NOT fire
  - id: negative-1
    type: negative
    prompt: "unrelated question"
    expected: no_trigger

Telemetry

sklab collects anonymous usage data (command names, duration, exit codes, scores, token counts). No skill content, file paths, or flag values are ever collected. To opt out:

sklab telemetry disable

See docs/PRIVACY.md for the full privacy policy.


Development

pip install -e ".[dev]"
pytest tests/ -v
mypy src/
ruff check src/
ruff format src/

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skill_lab-0.7.0.tar.gz (194.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skill_lab-0.7.0-py3-none-any.whl (158.1 kB view details)

Uploaded Python 3

File details

Details for the file skill_lab-0.7.0.tar.gz.

File metadata

  • Download URL: skill_lab-0.7.0.tar.gz
  • Upload date:
  • Size: 194.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skill_lab-0.7.0.tar.gz
Algorithm Hash digest
SHA256 3d3875161414e292e78b16c3e3c6b5ee068bc5b116745327f2849085277fb42b
MD5 564b8e24a660f883c72d8ecee573c57f
BLAKE2b-256 7650347b8074da05098f3414d7de50e7af5a955fe5d9601f75736c4861467d71

See more details on using hashes here.

Provenance

The following attestation bundles were made for skill_lab-0.7.0.tar.gz:

Publisher: publish.yml on 8ddieHu0314/Skill-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skill_lab-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: skill_lab-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 158.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skill_lab-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24f72d51b87a7f7f044470c81602477e62067d042b0524d421c3ea0eb05b0e99
MD5 0f80d87f0f2a96e2c8c97865476bf278
BLAKE2b-256 e678170dc137653bc5295617f28e396c1e2367c6eae75f3d35b7f93796c2c83a

See more details on using hashes here.

Provenance

The following attestation bundles were made for skill_lab-0.7.0-py3-none-any.whl:

Publisher: publish.yml on 8ddieHu0314/Skill-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page