Skip to main content

A Python CLI tool for evaluating agent skills through static analysis, trigger testing, and trace analysis

Project description

Skill Lab

PyPI version Python 3.10+ License: MIT

A Python CLI tool for evaluating agent skills through static analysis, trigger testing, and trace analysis.

Features

  • SKILL.md Parsing: Parse YAML frontmatter and markdown body from skill definitions
  • 18 Static Checks: Comprehensive checks across 4 dimensions
    • Structure: File existence, folder organization, frontmatter validation
    • Naming: Format, directory matching
    • Description: Length, trigger information
    • Content: Examples, line budget, reference depth
  • Trigger Testing: Test skill activation with 4 trigger types (explicit, implicit, contextual, negative)
  • Trace Analysis: Validate execution traces with 5 check types (command presence, file creation, event sequence, loop detection, efficiency)
  • Quality Scoring: Weighted 0-100 score based on check results
  • Multiple Output Formats: Console (rich formatting) and JSON

Installation

# From PyPI
pip install skill-lab

# From source
pip install -e .

# With development dependencies
pip install -e ".[dev]"

Quick Start

# Evaluate a skill (path defaults to current directory)
sklab evaluate ./my-skill
sklab evaluate                    # Uses current directory

# Quick validation (pass/fail)
sklab validate ./my-skill
sklab validate                    # Uses current directory

# List available checks
sklab list-checks

Usage

Evaluate a Skill

# Console output (default)
sklab evaluate ./my-skill

# JSON output
sklab evaluate ./my-skill --format json

# Save to file
sklab evaluate ./my-skill --output report.json

# Verbose (show all checks, not just failures)
sklab evaluate ./my-skill --verbose

# Spec-only (skip quality suggestions)
sklab evaluate ./my-skill --spec-only

Quick Validation

# Returns exit code 0 if valid, 1 if invalid
sklab validate ./my-skill

List Available Checks

# List all checks
sklab list-checks

# Filter by dimension
sklab list-checks --dimension structure

# Show only spec-required checks
sklab list-checks --spec-only

Trigger Testing

Test whether skills activate correctly with real LLM execution:

# Run trigger tests (path defaults to current directory)
sklab trigger ./my-skill
sklab trigger                     # Uses current directory

# Filter by trigger type
sklab trigger --type explicit
sklab trigger --type negative

Prerequisites: Trigger testing requires:

  • Claude CLI: Install via npm install -g @anthropic-ai/claude-code

Note: Codex CLI support is coming in v0.3.0.

Test Definition (tests/triggers.yaml):

skill: my-skill
test_cases:
  - id: explicit-1
    name: "Direct invocation to do something"
    type: explicit
    prompt: "$my-skill do something"
    expected: trigger
  - id: negative-1
    name: "Unrelated question (should not trigger)"
    type: negative
    prompt: "unrelated question"
    expected: no_trigger

Trace Analysis

Analyze execution traces with custom checks:

sklab eval-trace ./my-skill --trace ./execution.jsonl

Check Definition (tests/trace_checks.yaml):

checks:
  - id: npm-install-ran
    type: command_presence
    pattern: "npm install"
  - id: package-json-created
    type: file_creation
    path: "package.json"

Output Format (JSON)

{
  "skill_path": "/path/to/skill",
  "skill_name": "my-skill",
  "timestamp": "2026-01-25T14:30:00Z",
  "duration_ms": 45.3,
  "quality_score": 87.5,
  "overall_pass": true,
  "checks_run": 18,
  "checks_passed": 19,
  "checks_failed": 2,
  "results": [...],
  "summary": {
    "by_severity": {...},
    "by_dimension": {...}
  }
}

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=skill_lab

# Type checking
mypy src/

# Linting
ruff check src/

# Format code
ruff format src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skill_lab-0.2.0.tar.gz (55.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skill_lab-0.2.0-py3-none-any.whl (61.3 kB view details)

Uploaded Python 3

File details

Details for the file skill_lab-0.2.0.tar.gz.

File metadata

  • Download URL: skill_lab-0.2.0.tar.gz
  • Upload date:
  • Size: 55.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skill_lab-0.2.0.tar.gz
Algorithm Hash digest
SHA256 df651efe3b3a87ea60aa7c92d6a78bcf2005c2d602300badf7f481eaccc339ee
MD5 6015ab7b8b6388c8d2b0249449b36864
BLAKE2b-256 02a4f06b379edf23dbf6a8c4b57d5d622a5941c8f62bcf7feb373c8d7bf27f97

See more details on using hashes here.

Provenance

The following attestation bundles were made for skill_lab-0.2.0.tar.gz:

Publisher: publish.yml on 8ddieHu0314/Skill-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skill_lab-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: skill_lab-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 61.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for skill_lab-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebffbe903ff707ce65b3c648e4b46e701f78b45b720ac4b3168fec845487c2cd
MD5 6caf7b060ee4a1edfb47cd5176400285
BLAKE2b-256 72644a2695965b5d34a4f61a480552892e22412781f3e920a81976fc32b29058

See more details on using hashes here.

Provenance

The following attestation bundles were made for skill_lab-0.2.0-py3-none-any.whl:

Publisher: publish.yml on 8ddieHu0314/Skill-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page