A comprehensive testing framework for validating LLM tool calling capabilities with MCP services
Project description
Test and benchmark LLMs with MCP tools in minutes.
A testing framework for validating how LLMs call tools via Model Context Protocol (MCP) - compare Claude, GPT-4, Llama, and other models' accuracy, cost, and performance.
Documentation • Examples • Contributing • Discussions
Why testmcpy?
- Validate tool calling: Ensure LLMs call the right tools with correct parameters
- Compare models: Find the best price/performance balance for your use case
- Prevent regressions: Catch breaking changes in your MCP service with CI/CD
- Optimize costs: Track token usage and identify the most cost-effective models
Quick Start
# Install testmcpy
pip install testmcpy
# Run interactive setup
testmcpy setup
# Start testing
testmcpy chat # Interactive chat with MCP tools
testmcpy research # Test LLM tool-calling capabilities
testmcpy run tests/ # Run your test suite
That's it! No complex configuration needed to get started.
Key Features
Interactive TUI Dashboard (NEW!)
Beautiful terminal interface for MCP testing - no browser required:
testmcpy dash # Launch interactive dashboard
testmcpy dash --auto-refresh # Live connection monitoring
testmcpy dash --profile prod # Use specific MCP profile
TUI Features:
- Real-time MCP connection status
- Interactive tool exploration
- Live test execution with progress
- Configuration editor
- Global search across tools, tests, and settings
- Help system with keyboard shortcuts (press
?) - Multiple themes (default, light, high contrast)
Quick CLI Commands (no TUI):
testmcpy profiles # List MCP profiles (table)
testmcpy status # Connection status check
testmcpy explore-cli # Browse tools (non-interactive)
Multi-Provider Support
Test with Claude, GPT-4, Llama, and other models. Works with both paid APIs and free local models via Ollama.
Built-in Evaluators
Comprehensive validation out of the box:
- Tool Selection: Did the LLM call the right tool?
- Parameter Validation: Were correct parameters passed?
- Execution Success: Did the tool call complete without errors?
- Performance: Response time and token usage tracking
- Cost Analysis: Monitor API costs across test runs
Beautiful CLI & Web UI
- Rich terminal UI: Progress bars, colored output, formatted tables
- Optional web interface: Visual tool explorer and interactive chat
- Real-time feedback: Watch tests execute with live updates
When you start testmcpy, you're greeted with a beautiful terminal interface:
▀█▀ █▀▀ █▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ █▄█
█ ██▄ ▄█ █ █ ▀ █ █▄▄ █▀▀ █
🧪 Test • 📊 Benchmark • ✓ Validate
MCP Testing Framework
YAML Test Definitions
Define test suites as code for repeatable, version-controlled testing:
version: "1.0"
name: "Chart Operations Test Suite"
tests:
- name: "test_create_chart"
prompt: "Create a bar chart showing sales by region"
evaluators:
- name: "was_mcp_tool_called"
args:
tool_name: "create_chart"
- name: "execution_successful"
Use Cases
Perfect for:
- LLM Benchmarking: Compare tool-calling accuracy across Claude, GPT-4, and Llama
- MCP Service Testing: Validate your MCP integrations work correctly
- Regression Prevention: Catch breaking changes in CI/CD pipelines
- Model Selection: Make data-driven decisions about which LLM to use
- Cost Optimization: Find the best price/performance balance for your workload
- Parameter Validation: Ensure LLMs pass correct parameters to your tools
Architecture
testmcpy connects your LLM provider to your MCP service and validates the interactions:
graph TB
subgraph "CLI Interface"
CLI[testmcpy CLI]
WebUI[Web UI - Optional]
end
subgraph "Core Framework"
TestRunner[Test Runner]
Evaluators[Evaluators]
Config[Configuration Manager]
end
subgraph "LLM Providers"
Anthropic[Anthropic API]
OpenAI[OpenAI API]
Ollama[Ollama Local]
end
subgraph "MCP Integration"
MCPClient[MCP Client]
MCPService[MCP Service<br/>HTTP/SSE]
end
CLI --> TestRunner
WebUI --> TestRunner
TestRunner --> Config
TestRunner --> Evaluators
TestRunner --> Anthropic
TestRunner --> OpenAI
TestRunner --> Ollama
Anthropic --> MCPClient
OpenAI --> MCPClient
Ollama --> MCPClient
MCPClient --> MCPService
style CLI fill:#4A90E2
style WebUI fill:#4A90E2
style TestRunner fill:#50E3C2
style MCPClient fill:#F5A623
style MCPService fill:#BD10E0
How it works:
- Define test cases in YAML with prompts and expected behavior
- testmcpy sends prompts to your chosen LLM (Claude, GPT-4, Llama, etc.)
- LLM calls tools via MCP protocol to your service
- Evaluators validate tool selection, parameters, execution, and performance
- Get detailed pass/fail results with metrics and cost analysis
Installation
# Install base package
pip install testmcpy
# With web UI support
pip install 'testmcpy[server]'
# All optional features
pip install 'testmcpy[all]'
Requirements: Python 3.9-3.12 (3.13+ not yet supported)
Getting Started
1. Configuration
Run the interactive setup wizard to create configuration files:
testmcpy setup
This will guide you through:
- LLM Provider setup: Choose between Claude (Anthropic), GPT-4 (OpenAI), or local Ollama models
- MCP Service setup: Configure your MCP server URL and authentication
- API Key management: Detects keys from environment and saves them to
.llm_providers.yaml
The setup command creates two files in your current directory:
.llm_providers.yaml - LLM configuration with API keys:
default: prod
profiles:
prod:
name: "Production"
description: "High-quality models for production use"
providers:
- name: "Claude claude-sonnet-4-5"
provider: "anthropic"
model: "claude-sonnet-4-5"
api_key: "your-anthropic-api-key-here" # API key stored directly
timeout: 60
default: true
.mcp_services.yaml - MCP server profiles:
default: prod
profiles:
prod:
name: "Production"
description: "Production MCP service"
mcps:
- name: "Preset Superset"
mcp_url: "https://your-workspace.preset.io/mcp"
auth:
auth_type: "jwt" # or "bearer" or "none"
api_url: "https://api.app.preset.io/v1/auth/"
api_token: "your-api-token"
api_secret: "your-api-secret"
timeout: 30
rate_limit_rpm: 60
default: true
Configuration priority: CLI options > LLM Profile (.llm_providers.yaml) > MCP Profile (.mcp_services.yaml) > .env > Environment variables
Note: The setup command is idempotent - it's safe to run multiple times. Use --force to overwrite existing files.
2. Test Your MCP Service
# List available MCP tools
testmcpy tools
# Interactive chat to explore your tools
testmcpy chat
# Run automated research on tool-calling capabilities
testmcpy research --model claude-haiku-4-5
3. Create Test Suites
Define tests in YAML (tests/my_tests.yaml):
version: "1.0"
name: "My MCP Service Tests"
tests:
- name: "test_tool_selection"
prompt: "Create a bar chart showing sales by region"
evaluators:
- name: "was_mcp_tool_called"
args:
tool_name: "create_chart"
- name: "execution_successful"
- name: "within_time_limit"
args:
max_seconds: 30
Run your tests:
testmcpy run tests/ --model claude-haiku-4-5
Documentation
Core Guides
- Evaluator Reference - All available evaluators and usage examples
- Architecture - System design and data flow
- MCP Profiles - Managing multiple MCP service configurations
Examples
- Basic Tests - Simple test cases to get started
- CI/CD Integration - GitHub Actions and GitLab CI configurations
- Custom Evaluators - Building your own validation logic
Commands Reference
| Command | Description |
|---|---|
testmcpy dash |
Launch interactive TUI dashboard |
testmcpy setup |
Interactive configuration wizard |
testmcpy profiles |
List MCP profiles (table) |
testmcpy status |
Show MCP connection status |
testmcpy explore-cli |
Browse tools (non-interactive) |
testmcpy explorer |
Launch TUI tool explorer |
testmcpy tools |
List available MCP tools |
testmcpy research |
Test LLM tool-calling capabilities |
testmcpy run <path> |
Execute test suite |
testmcpy chat |
Interactive chat with MCP tools |
testmcpy serve |
Start web UI server |
testmcpy report |
Compare test results across models |
testmcpy config-cmd |
View current configuration |
testmcpy doctor |
Diagnose installation issues |
TUI Keyboard Shortcuts
Global Navigation:
h- Home screene- Explorer (MCP tools)5- Configuration?- Help modal/- Global searchq- Quit (with confirmation)F5- Refresh
Home Screen:
1-5- Quick actions (Tests, Explorer, Chat, Optimize, Config)p- Switch profileSpace- Connect/disconnect
Explorer:
↑↓orj/k- NavigateEnter- View detailst- Create testo- Optimize docs
Configuration:
Tab- Next fields- Save changesq- Quit without saving
LLM Providers
Configure LLM providers in .llm_providers.yaml. See .llm_providers.yaml.example for examples.
Anthropic (Recommended)
Best tool-calling accuracy, native MCP support:
# Set API key in .env or ~/.testmcpy
ANTHROPIC_API_KEY=sk-ant-your-key
# Configure in .llm_providers.yaml
prod:
name: "Production"
providers:
- name: "Claude Sonnet 4.5"
provider: "anthropic"
model: "claude-sonnet-4-5"
api_key_env: "ANTHROPIC_API_KEY"
default: true
Available models: claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-1
Ollama (Free, Local)
Perfect for development without API costs:
# Install Ollama
brew install ollama # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama and pull a model
ollama serve
ollama pull llama3.1:8b
# Configure in .llm_providers.yaml
local:
name: "Local Only"
providers:
- name: "Ollama Llama"
provider: "ollama"
model: "llama3.1:8b"
base_url: "http://localhost:11434"
default: true
OpenAI
# Set API key in .env or ~/.testmcpy
OPENAI_API_KEY=sk-your-key
# Configure in .llm_providers.yaml
openai:
name: "OpenAI"
providers:
- name: "GPT-4"
provider: "openai"
model: "gpt-4-turbo"
api_key_env: "OPENAI_API_KEY"
default: true
Built-in Evaluators
testmcpy includes comprehensive evaluators for validating LLM behavior:
Tool Calling
was_mcp_tool_called- Verify specific tool was invokedtool_call_count- Validate number of tool callstool_called_with_parameter- Check specific parameter was passedtool_called_with_parameters- Validate multiple parametersparameter_value_in_range- Ensure numeric parameters are valid
Execution
execution_successful- Check for errors or failureswithin_time_limit- Performance validationfinal_answer_contains- Validate response content
Cost & Performance
token_usage_reasonable- Cost efficiency validation- Performance metrics automatically tracked
Extensible: Easily add custom evaluators for your domain-specific needs.
See Evaluator Reference for complete documentation.
For MCP Service Developers
Integrate testmcpy into your MCP service for automated testing:
# Install testmcpy in your project
pip install testmcpy[all]
# Create tests for your MCP tools
cat > tests/my_service_tests.yaml <<EOF
version: "1.0"
name: "My MCP Service Tests"
tests:
- name: "test_tool_selection"
prompt: "List all items"
evaluators:
- name: "was_mcp_tool_called"
args:
tool_name: "list_items"
- name: "execution_successful"
EOF
# Run tests in CI/CD
testmcpy run tests/ --model claude-haiku-4-5
Getting Started Guide - Complete integration guide for your MCP service
CI/CD Examples - GitHub Actions and GitLab CI configurations
Web Interface
Optional React-based UI for visual testing:
# Install with UI support
pip install 'testmcpy[server]'
# Start server
testmcpy serve
Features:
- Visual MCP tool explorer
- Interactive chat interface
- Test management and execution
- Real-time results display
Access at http://localhost:8000
Examples
Check out the examples/ directory for:
- Basic test suites - Simple examples to get started
- CI/CD integration - GitHub Actions and GitLab CI workflows
- Custom evaluators - Building domain-specific validation
- Multi-model comparison - Benchmarking different LLMs
Contributing
We welcome contributions! Whether it's bug reports, feature requests, documentation improvements, or code contributions.
Read the Contributing Guide to get started.
Quick guidelines:
- Follow Black code formatting (100 char line length)
- Add tests for new features
- Ensure multi-provider compatibility (test with Ollama, Claude, GPT)
- Document your changes
- Be respectful and collaborative
Contributors
Built with contributions from:
Want to see your name here? Check out our Contributing Guide!
Community & Support
- Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
- Documentation: Browse the context/ directory
- Examples: Explore examples/ for sample code
License
Apache License 2.0 - See LICENSE for details.
By contributing, you agree that your contributions will be licensed under Apache 2.0.
Acknowledgments
Built to enable better LLM testing and integration with Model Context Protocol services.
Special thanks to the MCP community and all our contributors!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file testmcpy-0.2.15.tar.gz.
File metadata
- Download URL: testmcpy-0.2.15.tar.gz
- Upload date:
- Size: 690.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d134cdd2101b03a992ab45921dfd1b1f2c9c76d9a4ce30114dd77ca990e59315
|
|
| MD5 |
292baaec6031526f2b2d3e243f02768f
|
|
| BLAKE2b-256 |
767a32814f5f3d0d257f731ba40ad7209b24663b4dbbe730ce8beef5b34388de
|
File details
Details for the file testmcpy-0.2.15-py3-none-any.whl.
File metadata
- Download URL: testmcpy-0.2.15-py3-none-any.whl
- Upload date:
- Size: 740.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51791b5f1fc5a4f28f31a3d284a7ad95518bf25a3ccb5f4e50902d5cbe4a4430
|
|
| MD5 |
ac73e5afcf5e9be2025d9a448f3e286b
|
|
| BLAKE2b-256 |
7565a6c2e155f0ab00091a73f4bc113b69d7fd19a4bc4a9682fb16935dc5260a
|