Skip to main content

A Python CLI tool for evaluating agent skills through static analysis, trigger testing, and trace analysis

Project description

Skill Lab

PyPI version Python 3.10+ License: MIT

A Python CLI tool for evaluating agent skills through static analysis, trigger testing, and trace analysis.

Features

  • SKILL.md Parsing: Parse YAML frontmatter and markdown body from skill definitions
  • 19 Static Checks: Comprehensive checks across 4 dimensions
    • Structure: File existence, folder organization, frontmatter validation, standard fields
    • Naming: Format, directory matching
    • Description: Required, non-empty, max length
    • Content: Examples, line budget, reference depth
  • Trigger Testing: Test skill activation with 4 trigger types (explicit, implicit, contextual, negative)
  • Trigger Generation: LLM-powered test case generation via Anthropic API
  • Quality Scoring: Weighted 0-100 score based on check results
  • Multiple Output Formats: Console (rich formatting) and JSON

Installation

# From PyPI
pip install skill-lab

# With LLM-based trigger generation (requires Anthropic API)
pip install skill-lab[generate]

# From source
pip install -e .

# With development dependencies
pip install -e ".[dev]"

Setup

API Key (required for sklab generate)

sklab generate uses the Anthropic API to generate trigger test cases. Set your API key:

export ANTHROPIC_API_KEY=sk-ant-...

Get your key at console.anthropic.com.

Model Configuration (optional)

The default model is claude-haiku-4-5-20251001. Override it per-command or globally:

# Per-command
sklab generate ./my-skill --model claude-sonnet-4-5-20250929

# Global default via environment variable
export SKLAB_MODEL=claude-sonnet-4-5-20250929

Quick Start

# Evaluate a skill (path defaults to current directory)
sklab evaluate ./my-skill
sklab evaluate                    # Uses current directory

# Quick validation (pass/fail)
sklab validate ./my-skill

# Generate trigger test cases (requires ANTHROPIC_API_KEY)
sklab generate ./my-skill

# Run trigger tests
sklab trigger ./my-skill

# List available checks
sklab list-checks

Usage

Evaluate a Skill

# Console output (default)
sklab evaluate ./my-skill

# JSON output
sklab evaluate ./my-skill --format json

# Save to file
sklab evaluate ./my-skill --output report.json

# Verbose (show all checks, not just failures)
sklab evaluate ./my-skill --verbose

# Spec-only (skip quality suggestions)
sklab evaluate ./my-skill --spec-only

Quick Validation

# Returns exit code 0 if valid, 1 if invalid
sklab validate ./my-skill

List Available Checks

# List all checks
sklab list-checks

# Filter by dimension
sklab list-checks --dimension structure

# Show only spec-required checks
sklab list-checks --spec-only

Generate Trigger Tests

Auto-generate trigger test cases from a SKILL.md using an LLM:

# Generate tests (writes to .skill-lab/tests/triggers.yaml)
sklab generate ./my-skill

# Use a specific model
sklab generate ./my-skill --model claude-sonnet-4-5-20250929

# Overwrite existing tests
sklab generate ./my-skill --force

Generates ~13 test cases across 4 trigger types:

  • explicit (3): Direct $skill-name invocation
  • implicit (3): Describes the need without naming the skill
  • contextual (3): Realistic prompts with project context
  • negative (4): Adjacent requests that should NOT trigger

Token usage and cost are displayed after each run.

Trigger Testing

Run the generated (or hand-written) trigger tests against a real LLM:

# Run trigger tests (path defaults to current directory)
sklab trigger ./my-skill
sklab trigger                     # Uses current directory

# Filter by trigger type
sklab trigger --type explicit
sklab trigger --type negative

Prerequisites: Trigger testing requires:

  • Claude CLI: Install via npm install -g @anthropic-ai/claude-code

Test Definition (.skill-lab/tests/triggers.yaml):

skill: my-skill
test_cases:
  - id: explicit-1
    name: "Direct invocation to do something"
    type: explicit
    prompt: "$my-skill do something"
    expected: trigger
  - id: negative-1
    name: "Unrelated question (should not trigger)"
    type: negative
    prompt: "unrelated question"
    expected: no_trigger

Output Format (JSON)

{
  "skill_path": "/path/to/skill",
  "skill_name": "my-skill",
  "timestamp": "2026-01-25T14:30:00Z",
  "duration_ms": 45.3,
  "quality_score": 87.5,
  "overall_pass": true,
  "checks_run": 19,
  "checks_passed": 17,
  "checks_failed": 2,
  "results": [...],
  "summary": {
    "by_severity": {...},
    "by_dimension": {...}
  }
}

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=skill_lab

# Type checking
mypy src/

# Linting
ruff check src/

# Format code
ruff format src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skill_lab-0.3.0.tar.gz (65.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skill_lab-0.3.0-py3-none-any.whl (70.0 kB view details)

Uploaded Python 3

File details

Details for the file skill_lab-0.3.0.tar.gz.

File metadata

  • Download URL: skill_lab-0.3.0.tar.gz
  • Upload date:
  • Size: 65.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skill_lab-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0e3bafe287b6807ee92d0756a6078e7abcdc295087f292374e522c7bacd16ee3
MD5 d3c273fa847c477f93e4d53037c7d818
BLAKE2b-256 170a0c38f244b87ad5770a29fea2ff1211a6f1b2f2d9606e06b805c3808fef78

See more details on using hashes here.

Provenance

The following attestation bundles were made for skill_lab-0.3.0.tar.gz:

Publisher: publish.yml on 8ddieHu0314/Skill-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skill_lab-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: skill_lab-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 70.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skill_lab-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc9be49a00831229a897639c3173be83ffe378dbadc65edf6a774d3b99d2c777
MD5 f1ef806afe01e96f8bada2a49739a25c
BLAKE2b-256 b062e382181d1ef2ce93cdd8e3d464821b9009c32b1ff6b5042cb0104c75b347

See more details on using hashes here.

Provenance

The following attestation bundles were made for skill_lab-0.3.0-py3-none-any.whl:

Publisher: publish.yml on 8ddieHu0314/Skill-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page