Skip to main content

A comprehensive testing framework for validating LLM tool calling capabilities with MCP services

Project description

testmcpy logo

Test and benchmark LLMs with MCP tools in minutes.

A testing framework for validating how LLMs call tools via Model Context Protocol (MCP) - compare Claude, GPT-4, Llama, and other models' accuracy, cost, and performance.

Python 3.9+ License PyPI

[Screenshot: CLI test runner with colorful progress bars and results]

[Screenshot: Web UI showing tool explorer and interactive chat]

[GIF: Running a test suite from command line with real-time progress]


DocumentationExamplesContributingDiscussions


Why testmcpy?

  • Validate tool calling: Ensure LLMs call the right tools with correct parameters
  • Compare models: Find the best price/performance balance for your use case
  • Prevent regressions: Catch breaking changes in your MCP service with CI/CD
  • Optimize costs: Track token usage and identify the most cost-effective models

Quick Start

# Install testmcpy
pip install testmcpy

# Run interactive setup
testmcpy setup

# Start testing
testmcpy chat                     # Interactive chat with MCP tools
testmcpy research                 # Test LLM tool-calling capabilities
testmcpy run tests/              # Run your test suite

That's it! No complex configuration needed to get started.

Key Features

Interactive TUI Dashboard (NEW!)

Beautiful terminal interface for MCP testing - no browser required:

testmcpy dash                    # Launch interactive dashboard
testmcpy dash --auto-refresh     # Live connection monitoring
testmcpy dash --profile prod     # Use specific MCP profile

TUI Features:

  • Real-time MCP connection status
  • Interactive tool exploration
  • Live test execution with progress
  • Configuration editor
  • Global search across tools, tests, and settings
  • Help system with keyboard shortcuts (press ?)
  • Multiple themes (default, light, high contrast)

Quick CLI Commands (no TUI):

testmcpy profiles                # List MCP profiles (table)
testmcpy status                  # Connection status check
testmcpy explore-cli             # Browse tools (non-interactive)

[Screenshot: TUI dashboard showing profiles, quick actions, and keyboard shortcuts]

Multi-Provider Support

Test with Claude, GPT-4, Llama, and other models. Works with both paid APIs and free local models via Ollama.

[Screenshot: Model selector showing Claude, GPT-4, and Ollama options]

Built-in Evaluators

Comprehensive validation out of the box:

  • Tool Selection: Did the LLM call the right tool?
  • Parameter Validation: Were correct parameters passed?
  • Execution Success: Did the tool call complete without errors?
  • Performance: Response time and token usage tracking
  • Cost Analysis: Monitor API costs across test runs

[Screenshot: Test results showing pass/fail for different evaluators]

Beautiful CLI & Web UI

  • Rich terminal UI: Progress bars, colored output, formatted tables
  • Optional web interface: Visual tool explorer and interactive chat
  • Real-time feedback: Watch tests execute with live updates

When you start testmcpy, you're greeted with a beautiful terminal interface:

  ▀█▀ █▀▀ █▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ █▄█
   █  ██▄ ▄█  █  █ ▀ █ █▄▄ █▀▀  █

  🧪 Test  •  📊 Benchmark  •  ✓ Validate
  MCP Testing Framework

[Screenshot: Split view of CLI and Web UI running the same test]

YAML Test Definitions

Define test suites as code for repeatable, version-controlled testing:

version: "1.0"
name: "Chart Operations Test Suite"

tests:
  - name: "test_create_chart"
    prompt: "Create a bar chart showing sales by region"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"

Use Cases

Perfect for:

  • LLM Benchmarking: Compare tool-calling accuracy across Claude, GPT-4, and Llama
  • MCP Service Testing: Validate your MCP integrations work correctly
  • Regression Prevention: Catch breaking changes in CI/CD pipelines
  • Model Selection: Make data-driven decisions about which LLM to use
  • Cost Optimization: Find the best price/performance balance for your workload
  • Parameter Validation: Ensure LLMs pass correct parameters to your tools

Architecture

testmcpy connects your LLM provider to your MCP service and validates the interactions:

graph TB
    subgraph "CLI Interface"
        CLI[testmcpy CLI]
        WebUI[Web UI - Optional]
    end

    subgraph "Core Framework"
        TestRunner[Test Runner]
        Evaluators[Evaluators]
        Config[Configuration Manager]
    end

    subgraph "LLM Providers"
        Anthropic[Anthropic API]
        OpenAI[OpenAI API]
        Ollama[Ollama Local]
    end

    subgraph "MCP Integration"
        MCPClient[MCP Client]
        MCPService[MCP Service<br/>HTTP/SSE]
    end

    CLI --> TestRunner
    WebUI --> TestRunner
    TestRunner --> Config
    TestRunner --> Evaluators
    TestRunner --> Anthropic
    TestRunner --> OpenAI
    TestRunner --> Ollama
    Anthropic --> MCPClient
    OpenAI --> MCPClient
    Ollama --> MCPClient
    MCPClient --> MCPService

    style CLI fill:#4A90E2
    style WebUI fill:#4A90E2
    style TestRunner fill:#50E3C2
    style MCPClient fill:#F5A623
    style MCPService fill:#BD10E0

How it works:

  1. Define test cases in YAML with prompts and expected behavior
  2. testmcpy sends prompts to your chosen LLM (Claude, GPT-4, Llama, etc.)
  3. LLM calls tools via MCP protocol to your service
  4. Evaluators validate tool selection, parameters, execution, and performance
  5. Get detailed pass/fail results with metrics and cost analysis

Installation

# Install base package
pip install testmcpy

# With web UI support
pip install 'testmcpy[server]'

# All optional features
pip install 'testmcpy[all]'

Requirements: Python 3.9-3.12 (3.13+ not yet supported)

Getting Started

1. Configuration

Run the interactive setup wizard to create configuration files:

testmcpy setup

This will guide you through:

  • LLM Provider setup: Choose between Claude (Anthropic), GPT-4 (OpenAI), or local Ollama models
  • MCP Service setup: Configure your MCP server URL and authentication
  • API Key management: Detects keys from environment and saves them to .llm_providers.yaml

The setup command creates two files in your current directory:

.llm_providers.yaml - LLM configuration with API keys:

default: prod

profiles:
  prod:
    name: "Production"
    description: "High-quality models for production use"
    providers:
      - name: "Claude claude-sonnet-4-5"
        provider: "anthropic"
        model: "claude-sonnet-4-5"
        api_key: "your-anthropic-api-key-here"  # API key stored directly
        timeout: 60
        default: true

.mcp_services.yaml - MCP server profiles:

default: prod

profiles:
  prod:
    name: "Production"
    description: "Production MCP service"
    mcps:
      - name: "Preset Superset"
        mcp_url: "https://your-workspace.preset.io/mcp"
        auth:
          auth_type: "jwt"  # or "bearer" or "none"
          api_url: "https://api.app.preset.io/v1/auth/"
          api_token: "your-api-token"
          api_secret: "your-api-secret"
        timeout: 30
        rate_limit_rpm: 60
        default: true

Configuration priority: CLI options > LLM Profile (.llm_providers.yaml) > MCP Profile (.mcp_services.yaml) > .env > Environment variables

Note: The setup command is idempotent - it's safe to run multiple times. Use --force to overwrite existing files.

2. Test Your MCP Service

# List available MCP tools
testmcpy tools

# Interactive chat to explore your tools
testmcpy chat

# Run automated research on tool-calling capabilities
testmcpy research --model claude-haiku-4-5

3. Create Test Suites

Define tests in YAML (tests/my_tests.yaml):

version: "1.0"
name: "My MCP Service Tests"

tests:
  - name: "test_tool_selection"
    prompt: "Create a bar chart showing sales by region"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"
      - name: "within_time_limit"
        args:
          max_seconds: 30

Run your tests:

testmcpy run tests/ --model claude-haiku-4-5

Documentation

Core Guides

Examples

Commands Reference

Command Description
testmcpy dash Launch interactive TUI dashboard
testmcpy setup Interactive configuration wizard
testmcpy profiles List MCP profiles (table)
testmcpy status Show MCP connection status
testmcpy explore-cli Browse tools (non-interactive)
testmcpy explorer Launch TUI tool explorer
testmcpy tools List available MCP tools
testmcpy research Test LLM tool-calling capabilities
testmcpy run <path> Execute test suite
testmcpy chat Interactive chat with MCP tools
testmcpy serve Start web UI server
testmcpy report Compare test results across models
testmcpy config-cmd View current configuration
testmcpy doctor Diagnose installation issues

TUI Keyboard Shortcuts

Global Navigation:

  • h - Home screen
  • e - Explorer (MCP tools)
  • 5 - Configuration
  • ? - Help modal
  • / - Global search
  • q - Quit (with confirmation)
  • F5 - Refresh

Home Screen:

  • 1-5 - Quick actions (Tests, Explorer, Chat, Optimize, Config)
  • p - Switch profile
  • Space - Connect/disconnect

Explorer:

  • ↑↓ or j/k - Navigate
  • Enter - View details
  • t - Create test
  • o - Optimize docs

Configuration:

  • Tab - Next field
  • s - Save changes
  • q - Quit without saving

LLM Providers

Configure LLM providers in .llm_providers.yaml. See .llm_providers.yaml.example for examples.

Anthropic (Recommended)

Best tool-calling accuracy, native MCP support:

# Set API key in .env or ~/.testmcpy
ANTHROPIC_API_KEY=sk-ant-your-key
# Configure in .llm_providers.yaml
prod:
  name: "Production"
  providers:
    - name: "Claude Sonnet 4.5"
      provider: "anthropic"
      model: "claude-sonnet-4-5"
      api_key_env: "ANTHROPIC_API_KEY"
      default: true

Available models: claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-1

Ollama (Free, Local)

Perfect for development without API costs:

# Install Ollama
brew install ollama  # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama and pull a model
ollama serve
ollama pull llama3.1:8b
# Configure in .llm_providers.yaml
local:
  name: "Local Only"
  providers:
    - name: "Ollama Llama"
      provider: "ollama"
      model: "llama3.1:8b"
      base_url: "http://localhost:11434"
      default: true

OpenAI

# Set API key in .env or ~/.testmcpy
OPENAI_API_KEY=sk-your-key
# Configure in .llm_providers.yaml
openai:
  name: "OpenAI"
  providers:
    - name: "GPT-4"
      provider: "openai"
      model: "gpt-4-turbo"
      api_key_env: "OPENAI_API_KEY"
      default: true

Built-in Evaluators

testmcpy includes comprehensive evaluators for validating LLM behavior:

Tool Calling

  • was_mcp_tool_called - Verify specific tool was invoked
  • tool_call_count - Validate number of tool calls
  • tool_called_with_parameter - Check specific parameter was passed
  • tool_called_with_parameters - Validate multiple parameters
  • parameter_value_in_range - Ensure numeric parameters are valid

Execution

  • execution_successful - Check for errors or failures
  • within_time_limit - Performance validation
  • final_answer_contains - Validate response content

Cost & Performance

  • token_usage_reasonable - Cost efficiency validation
  • Performance metrics automatically tracked

Extensible: Easily add custom evaluators for your domain-specific needs.

See Evaluator Reference for complete documentation.

For MCP Service Developers

Integrate testmcpy into your MCP service for automated testing:

# Install testmcpy in your project
pip install testmcpy[all]

# Create tests for your MCP tools
cat > tests/my_service_tests.yaml <<EOF
version: "1.0"
name: "My MCP Service Tests"
tests:
  - name: "test_tool_selection"
    prompt: "List all items"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "list_items"
      - name: "execution_successful"
EOF

# Run tests in CI/CD
testmcpy run tests/ --model claude-haiku-4-5

Client Usage Guide - Complete integration guide for your MCP service

CI/CD Examples - GitHub Actions and GitLab CI configurations

Web Interface

Optional React-based UI for visual testing:

[Screenshot: Web UI dashboard with tool explorer]

# Install with UI support
pip install 'testmcpy[server]'

# Start server
testmcpy serve

Features:

  • Visual MCP tool explorer
  • Interactive chat interface
  • Test management and execution
  • Real-time results display

Access at http://localhost:8000

Examples

Check out the examples/ directory for:

  • Basic test suites - Simple examples to get started
  • CI/CD integration - GitHub Actions and GitLab CI workflows
  • Custom evaluators - Building domain-specific validation
  • Multi-model comparison - Benchmarking different LLMs

Contributing

We welcome contributions! Whether it's bug reports, feature requests, documentation improvements, or code contributions.

Read the Contributing Guide to get started.

Quick guidelines:

  • Follow Black code formatting (100 char line length)
  • Add tests for new features
  • Ensure multi-provider compatibility (test with Ollama, Claude, GPT)
  • Document your changes
  • Be respectful and collaborative

Contributors

Built with contributions from:

Want to see your name here? Check out our Contributing Guide!

Community & Support

License

Apache License 2.0 - See LICENSE for details.

By contributing, you agree that your contributions will be licensed under Apache 2.0.


Acknowledgments

Built to enable better LLM testing and integration with Model Context Protocol services.

Special thanks to the MCP community and all our contributors!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testmcpy-0.2.11.tar.gz (689.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

testmcpy-0.2.11-py3-none-any.whl (741.1 kB view details)

Uploaded Python 3

File details

Details for the file testmcpy-0.2.11.tar.gz.

File metadata

  • Download URL: testmcpy-0.2.11.tar.gz
  • Upload date:
  • Size: 689.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.2.11.tar.gz
Algorithm Hash digest
SHA256 db145d334aff02c91a9ee4998507f0eb6b72ef5d84debf44e6a41866e3d4fe82
MD5 ea03e92237cd0382654e46efc6800bd5
BLAKE2b-256 3865d71476f554e10d597d9b5195c2e57107932f27f16952382a5037cfd31e23

See more details on using hashes here.

File details

Details for the file testmcpy-0.2.11-py3-none-any.whl.

File metadata

  • Download URL: testmcpy-0.2.11-py3-none-any.whl
  • Upload date:
  • Size: 741.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 6d7de433cb7a6048e3a9f8f4e618384a6bd69e001ede7fa5c69176abf4aa8c77
MD5 de4dd0f5d7c274fd4bb875260d3db7c6
BLAKE2b-256 db0308e02e0d3a7e59f57e9ce14065d8dfde7740d1df832d83f84858d1a48fd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page