The testing framework for skill engineering. Test tool descriptions, prompt templates, agent skills, and custom agents with real LLMs. AI analyzes results and tells you what to fix.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sbroenne

These details have not been verified by PyPI

Project description

pytest-skill-engineering

Test-Driven Skill Engineering for GitHub Copilot

Test MCP servers, CLI tools, Agent Skills, and custom agents using the real GitHub Copilot coding agent. Write tests as prompts, run them against actual Copilot sessions, and get AI-powered insights on what to fix.

Why?

Your MCP server passes all unit tests. Then a user tries it in GitHub Copilot and:

Copilot picks the wrong tool
Passes garbage parameters
Can't recover from errors
Ignores your skill's instructions

Why? Because you tested the code, not the AI interface.

For LLMs, your API isn't functions and types — it's tool descriptions, Agent Skills, custom agent instructions, and schemas. These are what GitHub Copilot actually sees. Traditional tests can't validate them.

The key insight: your test is a prompt. You write what a user would say ("What's my checking balance?"), and Copilot figures out how to use your tools. If it can't, your AI interface needs work.

What This Tests

pytest-skill-engineering validates the full skill engineering stack that ships with your MCP server:

MCP Server Tools — Can Copilot discover and call your tools correctly?
Agent Skills (agentskills.io spec-compliant) — Does domain knowledge improve performance?
Custom Agents (.agent.md files) — Do your specialist instructions trigger proper subagent dispatch?
MCP Prompt Templates — Do server-side templates produce the right behavior?
CLI Tools — Can Copilot use command-line interfaces effectively?

Plus A/B testing, multi-turn sessions, clarification detection, and AI-powered reports that tell you exactly what to fix.

How It Works

Write tests as prompts. Run them with the real GitHub Copilot coding agent. Assert on what happened:

from pytest_skill_engineering.copilot import CopilotEval

async def test_balance_query(copilot_eval):
    agent = CopilotEval(
        skill_directories=["skills/banking-advisor"],
        max_turns=10,
    )
    result = await copilot_eval(agent, "What's my checking balance?")
    
    assert result.success
    assert result.tool_was_called("get_balance")

The workflow:

Write a test — a prompt that describes what a user would say
Run it — GitHub Copilot tries to use your tools
Fix the interface — improve tool descriptions, skills, or agent instructions until it passes
AI analysis tells you what to optimize — cost, redundant calls, better prompts

If a test fails, your AI interface needs work, not your code.

Agent Skills — First-Class Support

pytest-skill-engineering provides full Agent Skills spec compliance:

Compatibility field — Mark required tools, models, or platforms
Metadata — Title, description, version, attribution
Allowed-tools — Restrict which tools the agent can use
Scripts & Assets — Package Python scripts, prompts, and resources
Eval Bridge — Import evals from evals/evals.json, export grading results

Agent Skills are loaded natively when testing with CopilotEval — exactly as users experience them.

AI-Powered Reports

AI analyzes your results and tells you what to fix: which configuration to deploy, how to improve tool descriptions, where to cut costs. See a sample report →

AI Analysis — winner recommendation, metrics, and comparative analysis

Quick Start

# Install
uv add pytest-skill-engineering

# Authenticate (one-time)
gh auth login

# Run tests
pytest tests/

Configure AI Analysis (optional but recommended)

The AI-powered report needs a model to generate insights. Configure it in pyproject.toml:

[tool.pytest.ini_options]
addopts = "--aitest-summary-model=copilot/gpt-5-mini"

You can also use Azure OpenAI or other providers if you prefer — see Configuration.

Features

MCP Server Testing — Test tools, prompt templates, and bundled skills with real Copilot sessions
Agent Skills — Full agentskills.io spec compliance (compatibility, metadata, allowed-tools, evals bridge)
Custom Agents — Test .agent.md files and validate subagent dispatch
CLI Tool Testing — Verify Copilot can use command-line interfaces
Plugin Testing — Load complete plugin directories (plugin.json, .github/, .claude/ layouts) with auto-discovery
A/B Testing — Compare instructions, skills, custom agent versions, or tool configurations
Eval Leaderboard — Auto-ranked by pass rate and cost
Multi-Turn Sessions — Test conversations that build on context
Clarification Detection — Catch agents that ask questions instead of acting
LLM Assertions — Semantic checks with llm_assert, multi-dimension scoring with llm_score, image evaluation with llm_assert_image
AI-Powered Reports — Actionable feedback on tool descriptions, prompts, and costs
Cost Tracking — Copilot premium request tracking + USD estimation via pricing.toml

Who This Is For

MCP server authors — Validate that GitHub Copilot can actually use your tools
Agent Skills authors — Test skills exactly as users experience them in Copilot
Custom agent builders — Validate .agent.md instructions and subagent dispatch
Plugin developers — Test complete GitHub Copilot CLI plugins end-to-end
Teams shipping Copilot integrations — Catch skill stack regressions in CI/CD

Documentation

📚 Full Documentation

Requirements

Python 3.11+
pytest 9.0+
GitHub Copilot subscription (required)

Acknowledgments

Inspired by agent-benchmark.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sbroenne

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.5

Apr 20, 2026

This version

0.6.4

Apr 19, 2026

0.6.3

Apr 18, 2026

0.6.2

Apr 18, 2026

0.6.1

Apr 11, 2026

0.6.0

Apr 11, 2026

0.5.9

Mar 31, 2026

0.5.8

Mar 23, 2026

0.5.7

Mar 22, 2026

0.3.0

Mar 22, 2026

0.2.0

Mar 21, 2026

0.1.0

Mar 1, 2026

0.0.2

Feb 28, 2026

0.0.1

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_skill_engineering-0.6.4.tar.gz (137.4 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_skill_engineering-0.6.4-py3-none-any.whl (181.0 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file pytest_skill_engineering-0.6.4.tar.gz.

File metadata

Download URL: pytest_skill_engineering-0.6.4.tar.gz
Upload date: Apr 19, 2026
Size: 137.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pytest_skill_engineering-0.6.4.tar.gz
Algorithm	Hash digest
SHA256	`4b008f6758ee7882690c39bcca44a0b945394315b5594581f5ac3622621f6ac6`
MD5	`06132fe2519ee20de25fe364db6cfc37`
BLAKE2b-256	`e4d17843d974d40f44d5e649d14df960e3aa70551938f62bd9e1705fc6ce7ef8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_skill_engineering-0.6.4.tar.gz:

Publisher: release.yml on sbroenne/pytest-skill-engineering

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_skill_engineering-0.6.4.tar.gz
- Subject digest: 4b008f6758ee7882690c39bcca44a0b945394315b5594581f5ac3622621f6ac6
- Sigstore transparency entry: 1340345975
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: sbroenne/pytest-skill-engineering@cd656559b9cfbc226aaed49fcf5290d1ef0d0621
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sbroenne
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@cd656559b9cfbc226aaed49fcf5290d1ef0d0621
- Trigger Event: workflow_dispatch

File details

Details for the file pytest_skill_engineering-0.6.4-py3-none-any.whl.

File metadata

Download URL: pytest_skill_engineering-0.6.4-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 181.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pytest_skill_engineering-0.6.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d7b2346ab7051ce28a063bef3ca60b61728ea3a7903b4f36fe581dec3a7eafd`
MD5	`2c7d66d305ca9e87bc4fa94c2ee658ed`
BLAKE2b-256	`baf4afef4c9601ab4cd54f31f0478628abc997f6a6caf958b40298018eafd391`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_skill_engineering-0.6.4-py3-none-any.whl:

Publisher: release.yml on sbroenne/pytest-skill-engineering

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_skill_engineering-0.6.4-py3-none-any.whl
- Subject digest: 6d7b2346ab7051ce28a063bef3ca60b61728ea3a7903b4f36fe581dec3a7eafd
- Sigstore transparency entry: 1340345980
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: sbroenne/pytest-skill-engineering@cd656559b9cfbc226aaed49fcf5290d1ef0d0621
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sbroenne
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@cd656559b9cfbc226aaed49fcf5290d1ef0d0621
- Trigger Event: workflow_dispatch

pytest-skill-engineering 0.6.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pytest-skill-engineering

Why?

What This Tests

How It Works

Agent Skills — First-Class Support

AI-Powered Reports

Quick Start

Configure AI Analysis (optional but recommended)

Features

Who This Is For

Documentation

Requirements

Acknowledgments

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance