The testing framework for skill engineering. Test tool descriptions, prompt templates, agent skills, and custom agents with real LLMs. AI analyzes results and tells you what to fix.
Project description
pytest-skill-engineering
Test-Driven Skill Engineering for GitHub Copilot
Test MCP servers, CLI tools, Agent Skills, and custom agents using the real GitHub Copilot coding agent. Write tests as prompts, run them against actual Copilot sessions, and get AI-powered insights on what to fix.
Why?
Your MCP server passes all unit tests. Then a user tries it in GitHub Copilot and:
- Copilot picks the wrong tool
- Passes garbage parameters
- Can't recover from errors
- Ignores your skill's instructions
Why? Because you tested the code, not the AI interface.
For LLMs, your API isn't functions and types — it's tool descriptions, Agent Skills, custom agent instructions, and schemas. These are what GitHub Copilot actually sees. Traditional tests can't validate them.
The key insight: your test is a prompt. You write what a user would say ("What's my checking balance?"), and Copilot figures out how to use your tools. If it can't, your AI interface needs work.
What This Tests
pytest-skill-engineering validates the full skill engineering stack that ships with your MCP server:
- MCP Server Tools — Can Copilot discover and call your tools correctly?
- Agent Skills (agentskills.io spec-compliant) — Does domain knowledge improve performance?
- Custom Agents (
.agent.mdfiles) — Do your specialist instructions trigger proper subagent dispatch? - MCP Prompt Templates — Do server-side templates produce the right behavior?
- CLI Tools — Can Copilot use command-line interfaces effectively?
Plus A/B testing, multi-turn sessions, clarification detection, and AI-powered reports that tell you exactly what to fix.
How It Works
Write tests as prompts. Run them with the real GitHub Copilot coding agent. Assert on what happened:
from pytest_skill_engineering.copilot import CopilotEval
async def test_balance_query(copilot_eval):
agent = CopilotEval(
skill_directories=["skills/banking-advisor"],
max_turns=10,
)
result = await copilot_eval(agent, "What's my checking balance?")
assert result.success
assert result.tool_was_called("get_balance")
The workflow:
- Write a test — a prompt that describes what a user would say
- Run it — GitHub Copilot tries to use your tools
- Fix the interface — improve tool descriptions, skills, or agent instructions until it passes
- AI analysis tells you what to optimize — cost, redundant calls, better prompts
If a test fails, your AI interface needs work, not your code.
Agent Skills — First-Class Support
pytest-skill-engineering provides full Agent Skills spec compliance:
- Compatibility field — Mark required tools, models, or platforms
- Metadata — Title, description, version, attribution
- Allowed-tools — Restrict which tools the agent can use
- Scripts & Assets — Package Python scripts, prompts, and resources
- Eval Bridge — Import evals from
evals/evals.json, export grading results
Agent Skills are loaded natively when testing with CopilotEval — exactly as users experience them.
AI-Powered Reports
AI analyzes your results and tells you what to fix: which configuration to deploy, how to improve tool descriptions, where to cut costs. See a sample report →
Quick Start
# Install
uv add pytest-skill-engineering
# Authenticate (one-time)
gh auth login
# Run tests
pytest tests/
Configure AI Analysis (optional but recommended)
The AI-powered report needs a model to generate insights. Configure it in pyproject.toml:
[tool.pytest.ini_options]
addopts = "--aitest-summary-model=copilot/gpt-5-mini"
You can also use Azure OpenAI or other providers if you prefer — see Configuration.
Features
- MCP Server Testing — Test tools, prompt templates, and bundled skills with real Copilot sessions
- Agent Skills — Full agentskills.io spec compliance (compatibility, metadata, allowed-tools, evals bridge)
- Custom Agents — Test
.agent.mdfiles and validate subagent dispatch - CLI Tool Testing — Verify Copilot can use command-line interfaces
- Plugin Testing — Load complete plugin directories (plugin.json, .github/, .claude/ layouts) with auto-discovery
- A/B Testing — Compare instructions, skills, custom agent versions, or tool configurations
- Eval Leaderboard — Auto-ranked by pass rate and cost
- Multi-Turn Sessions — Test conversations that build on context
- Clarification Detection — Catch agents that ask questions instead of acting
- LLM Assertions — Semantic checks with
llm_assert, multi-dimension scoring withllm_score, image evaluation withllm_assert_image - AI-Powered Reports — Actionable feedback on tool descriptions, prompts, and costs
- Cost Tracking — Copilot premium request tracking + USD estimation via
pricing.toml
Who This Is For
- MCP server authors — Validate that GitHub Copilot can actually use your tools
- Agent Skills authors — Test skills exactly as users experience them in Copilot
- Custom agent builders — Validate
.agent.mdinstructions and subagent dispatch - Plugin developers — Test complete GitHub Copilot CLI plugins end-to-end
- Teams shipping Copilot integrations — Catch skill stack regressions in CI/CD
Documentation
Requirements
- Python 3.11+
- pytest 9.0+
- GitHub Copilot subscription (required)
Acknowledgments
Inspired by agent-benchmark.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_skill_engineering-0.6.4.tar.gz.
File metadata
- Download URL: pytest_skill_engineering-0.6.4.tar.gz
- Upload date:
- Size: 137.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b008f6758ee7882690c39bcca44a0b945394315b5594581f5ac3622621f6ac6
|
|
| MD5 |
06132fe2519ee20de25fe364db6cfc37
|
|
| BLAKE2b-256 |
e4d17843d974d40f44d5e649d14df960e3aa70551938f62bd9e1705fc6ce7ef8
|
Provenance
The following attestation bundles were made for pytest_skill_engineering-0.6.4.tar.gz:
Publisher:
release.yml on sbroenne/pytest-skill-engineering
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_skill_engineering-0.6.4.tar.gz -
Subject digest:
4b008f6758ee7882690c39bcca44a0b945394315b5594581f5ac3622621f6ac6 - Sigstore transparency entry: 1340345975
- Sigstore integration time:
-
Permalink:
sbroenne/pytest-skill-engineering@cd656559b9cfbc226aaed49fcf5290d1ef0d0621 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sbroenne
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cd656559b9cfbc226aaed49fcf5290d1ef0d0621 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pytest_skill_engineering-0.6.4-py3-none-any.whl.
File metadata
- Download URL: pytest_skill_engineering-0.6.4-py3-none-any.whl
- Upload date:
- Size: 181.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7b2346ab7051ce28a063bef3ca60b61728ea3a7903b4f36fe581dec3a7eafd
|
|
| MD5 |
2c7d66d305ca9e87bc4fa94c2ee658ed
|
|
| BLAKE2b-256 |
baf4afef4c9601ab4cd54f31f0478628abc997f6a6caf958b40298018eafd391
|
Provenance
The following attestation bundles were made for pytest_skill_engineering-0.6.4-py3-none-any.whl:
Publisher:
release.yml on sbroenne/pytest-skill-engineering
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_skill_engineering-0.6.4-py3-none-any.whl -
Subject digest:
6d7b2346ab7051ce28a063bef3ca60b61728ea3a7903b4f36fe581dec3a7eafd - Sigstore transparency entry: 1340345980
- Sigstore integration time:
-
Permalink:
sbroenne/pytest-skill-engineering@cd656559b9cfbc226aaed49fcf5290d1ef0d0621 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sbroenne
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cd656559b9cfbc226aaed49fcf5290d1ef0d0621 -
Trigger Event:
workflow_dispatch
-
Statement type: