Skip to main content

A comprehensive testing framework for validating LLM tool calling capabilities with MCP services

Project description

testmcpy

A comprehensive testing framework for validating LLM tool-calling capabilities with MCP (Model Context Protocol) services.

Python 3.9+ License PyPI

Test and evaluate how different LLM models interact with MCP tools. Compare Claude, GPT-4, Llama, and other models' tool-calling accuracy, cost, and performance with any MCP service.

Features

  • Multi-Provider Support: Anthropic (Claude), OpenAI (GPT), Ollama (local models)
  • MCP Tool Testing: Validate LLM interactions with any MCP service
  • Built-in Evaluators: Test tool calling accuracy, response quality, performance, and cost
  • Beautiful CLI: Rich terminal UI with progress bars and formatted output
  • Web Interface: Optional React-based UI for visual testing and exploration
  • Test Suites: YAML/JSON test definitions with comprehensive evaluation
  • Model Comparison: Side-by-side benchmarking of different LLMs
  • Cost Tracking: Monitor token usage and API costs across test runs

Architecture

graph TB
    subgraph "CLI Interface"
        CLI[testmcpy CLI]
        WebUI[Web UI - Optional]
    end

    subgraph "Core Framework"
        TestRunner[Test Runner]
        Evaluators[Evaluators]
        Config[Configuration Manager]
    end

    subgraph "LLM Providers"
        Anthropic[Anthropic API]
        OpenAI[OpenAI API]
        Ollama[Ollama Local]
    end

    subgraph "MCP Integration"
        MCPClient[MCP Client]
        MCPService[MCP Service<br/>HTTP/SSE]
    end

    CLI --> TestRunner
    WebUI --> TestRunner
    TestRunner --> Config
    TestRunner --> Evaluators
    TestRunner --> Anthropic
    TestRunner --> OpenAI
    TestRunner --> Ollama
    Anthropic --> MCPClient
    OpenAI --> MCPClient
    Ollama --> MCPClient
    MCPClient --> MCPService

    style CLI fill:#4A90E2
    style WebUI fill:#4A90E2
    style TestRunner fill:#50E3C2
    style MCPClient fill:#F5A623
    style MCPService fill:#BD10E0

Quick Start

Installation

# Install base package
pip install testmcpy

# With web UI support
pip install 'testmcpy[server]'

# All optional features
pip install 'testmcpy[all]'

First-Time Setup

# Interactive configuration wizard
testmcpy setup

# View current configuration
testmcpy config-cmd

Basic Usage

# List available MCP tools
testmcpy tools

# Test LLM tool-calling capabilities
testmcpy research --model claude-haiku-4-5

# Run test suite
testmcpy run tests/ --model claude-haiku-4-5

# Interactive chat with MCP tools
testmcpy chat

# Start web UI
testmcpy serve

Configuration

testmcpy uses a layered configuration system with clear priorities:

Priority Order (highest to lowest):

  1. Command-line options
  2. .env in current directory
  3. ~/.testmcpy user config
  4. Environment variables
  5. Built-in defaults

Example Configuration (~/.testmcpy)

# MCP Service
MCP_URL=http://localhost:5008/mcp/
MCP_AUTH_TOKEN=your_bearer_token

# LLM Provider
DEFAULT_PROVIDER=anthropic
DEFAULT_MODEL=claude-haiku-4-5
ANTHROPIC_API_KEY=sk-ant-...

# Optional: Dynamic JWT for Preset/Superset
# MCP_AUTH_API_URL=https://api.app.preset.io/v1/auth/
# MCP_AUTH_API_TOKEN=your_api_token
# MCP_AUTH_API_SECRET=your_api_secret

Test Cases

Define test cases in YAML:

version: "1.0"
name: "Chart Operations Test Suite"

tests:
  - name: "test_create_chart"
    prompt: "Create a bar chart showing sales by region"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"
      - name: "within_time_limit"
        args:
          max_seconds: 30

Run with:

testmcpy run tests/chart_tests.yaml --model claude-haiku-4-5

LLM Providers

Anthropic (Recommended)

Best tool-calling accuracy, supports HTTP MCP services:

# Add to ~/.testmcpy
ANTHROPIC_API_KEY=sk-ant-your-key
DEFAULT_PROVIDER=anthropic
DEFAULT_MODEL=claude-haiku-4-5  # Fast & cost-effective

Models: claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-1

Ollama (Free, Local)

For development without API costs:

# Install and start Ollama
brew install ollama  # or: curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollama pull llama3.1:8b

# Configure testmcpy
echo "DEFAULT_PROVIDER=ollama" >> ~/.testmcpy
echo "DEFAULT_MODEL=llama3.1:8b" >> ~/.testmcpy

OpenAI

OPENAI_API_KEY=sk-your-key
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4-turbo

Built-in Evaluators

Generic Evaluators

  • was_mcp_tool_called - Verify specific MCP tool was invoked
  • execution_successful - Check for errors or failures
  • final_answer_contains - Validate response content
  • within_time_limit - Performance testing
  • token_usage_reasonable - Cost efficiency validation

Superset/Preset Evaluators

  • was_superset_chart_created - Verify chart creation
  • sql_query_valid - Validate SQL syntax

Extensible: Add custom evaluators for your MCP service.

Commands

Command Description
testmcpy setup Interactive configuration wizard
testmcpy tools List available MCP tools
testmcpy research Test LLM tool-calling capabilities
testmcpy run Execute test suite
testmcpy chat Interactive chat with MCP tools
testmcpy serve Start web UI server
testmcpy report Compare test results across models
testmcpy config-cmd View current configuration
testmcpy doctor Diagnose installation issues
testmcpy --version Show version

Web Interface

The optional web UI provides:

  • Visual MCP tool explorer
  • Interactive chat interface
  • Test management and execution
  • Real-time results display
# Install web UI dependencies
pip install 'testmcpy[server]'

# Start server
testmcpy serve

Access at http://localhost:8000

Use Cases

  • LLM Benchmarking: Compare Claude, GPT-4, Llama tool-calling accuracy
  • MCP Service Testing: Validate your MCP integrations
  • Cost Optimization: Find the best price/performance balance
  • Regression Testing: Ensure MCP tools work across updates
  • Model Selection: Make data-driven decisions about which LLM to use

Requirements

  • Python: 3.9 - 3.12 (3.13+ not yet supported)
  • Virtual Environment: Recommended
  • Operating Systems: macOS, Linux, Windows (WSL recommended)

Optional Dependencies

pip install 'testmcpy[server]'  # Web UI (FastAPI, uvicorn)
pip install 'testmcpy[sdk]'     # Claude Agent SDK
pip install 'testmcpy[dev]'     # Development tools
pip install 'testmcpy[all]'     # Everything

Project Structure

testmcpy/
├── testmcpy/
│   ├── cli.py              # CLI interface
│   ├── config.py           # Configuration management
│   ├── src/                # Core modules
│   │   ├── mcp_client.py   # MCP protocol client
│   │   ├── llm_integration.py  # LLM provider abstraction
│   │   └── test_runner.py  # Test execution engine
│   ├── evals/              # Evaluation functions
│   │   └── base_evaluators.py
│   ├── server/             # Web UI backend (optional)
│   │   ├── api.py
│   │   └── websocket.py
│   └── ui/                 # React web UI (optional)
│       ├── src/
│       └── dist/
├── tests/                  # Test case definitions
├── reports/                # Test results
└── README.md

Development

# Clone repository
git clone https://github.com/preset-io/testmcpy.git
cd testmcpy

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
pip install -e '.[dev]'

# Run tests
pytest

# Format code
black .

# Type checking
mypy testmcpy

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

When contributing:

  • Use type hints and async/await patterns
  • Follow Black code formatting
  • Add tests for new features
  • Document changes in README
  • Ensure multi-provider compatibility

License

Apache License 2.0 - See LICENSE for details.

Support

Acknowledgments

Built by the team at Preset for testing LLM integrations with Apache Superset and beyond.


Made with ❤️ by Preset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testmcpy-0.2.0.tar.gz (215.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

testmcpy-0.2.0-py3-none-any.whl (219.0 kB view details)

Uploaded Python 3

File details

Details for the file testmcpy-0.2.0.tar.gz.

File metadata

  • Download URL: testmcpy-0.2.0.tar.gz
  • Upload date:
  • Size: 215.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fccd4633fc7257fdb21e3a9721eddde292267eee7f10885ceec51c48e9275078
MD5 779f58fc8812fba6a3636e1fedefa857
BLAKE2b-256 612f37054299a907ee99f991e6250a02ec10b7cc05503d608fbd6a9523f20e2f

See more details on using hashes here.

File details

Details for the file testmcpy-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: testmcpy-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 219.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c411df992149601f78e7be4a8bde6476262e689f0190b6d6f82623f59797ed7
MD5 7a82dd50663071a06c242d3be3ee8593
BLAKE2b-256 fac114c301326b7cdd32803ccb54303b886671adf78d7337a816d45d29f03044

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page