A comprehensive testing framework for validating LLM tool calling capabilities with MCP services

These details have not been verified by PyPI

Project links

Project description

testmcpy

A comprehensive testing framework for validating LLM tool-calling capabilities with MCP (Model Context Protocol) services.

Test and evaluate how different LLM models interact with MCP tools. Compare Claude, GPT-4, Llama, and other models' tool-calling accuracy, cost, and performance with any MCP service.

Features

Multi-Provider Support: Anthropic (Claude), OpenAI (GPT), Ollama (local models)
MCP Tool Testing: Validate LLM interactions with any MCP service
Built-in Evaluators: Test tool calling accuracy, response quality, performance, and cost
Beautiful CLI: Rich terminal UI with progress bars and formatted output
Web Interface: Optional React-based UI for visual testing and exploration
Test Suites: YAML/JSON test definitions with comprehensive evaluation
Model Comparison: Side-by-side benchmarking of different LLMs
Cost Tracking: Monitor token usage and API costs across test runs

Architecture

graph TB
    subgraph "CLI Interface"
        CLI[testmcpy CLI]
        WebUI[Web UI - Optional]
    end

    subgraph "Core Framework"
        TestRunner[Test Runner]
        Evaluators[Evaluators]
        Config[Configuration Manager]
    end

    subgraph "LLM Providers"
        Anthropic[Anthropic API]
        OpenAI[OpenAI API]
        Ollama[Ollama Local]
    end

    subgraph "MCP Integration"
        MCPClient[MCP Client]
        MCPService[MCP Service<br/>HTTP/SSE]
    end

    CLI --> TestRunner
    WebUI --> TestRunner
    TestRunner --> Config
    TestRunner --> Evaluators
    TestRunner --> Anthropic
    TestRunner --> OpenAI
    TestRunner --> Ollama
    Anthropic --> MCPClient
    OpenAI --> MCPClient
    Ollama --> MCPClient
    MCPClient --> MCPService

    style CLI fill:#4A90E2
    style WebUI fill:#4A90E2
    style TestRunner fill:#50E3C2
    style MCPClient fill:#F5A623
    style MCPService fill:#BD10E0

Quick Start

Installation

# Install base package
pip install testmcpy

# With web UI support
pip install 'testmcpy[server]'

# All optional features
pip install 'testmcpy[all]'

First-Time Setup

# Interactive configuration wizard
testmcpy setup

# View current configuration
testmcpy config-cmd

Basic Usage

# List available MCP tools
testmcpy tools

# Test LLM tool-calling capabilities
testmcpy research --model claude-haiku-4-5

# Run test suite
testmcpy run tests/ --model claude-haiku-4-5

# Interactive chat with MCP tools
testmcpy chat

# Start web UI
testmcpy serve

Configuration

testmcpy uses a layered configuration system with clear priorities:

Priority Order (highest to lowest):

Command-line options
.env in current directory
~/.testmcpy user config
Environment variables
Built-in defaults

Example Configuration (`~/.testmcpy`)

# MCP Service
MCP_URL=http://localhost:5008/mcp/
MCP_AUTH_TOKEN=your_bearer_token

# LLM Provider
DEFAULT_PROVIDER=anthropic
DEFAULT_MODEL=claude-haiku-4-5
ANTHROPIC_API_KEY=sk-ant-...

# Optional: Dynamic JWT for Preset/Superset
# MCP_AUTH_API_URL=https://api.app.preset.io/v1/auth/
# MCP_AUTH_API_TOKEN=your_api_token
# MCP_AUTH_API_SECRET=your_api_secret

Test Cases

Define test cases in YAML:

version: "1.0"
name: "Chart Operations Test Suite"

tests:
  - name: "test_create_chart"
    prompt: "Create a bar chart showing sales by region"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"
      - name: "within_time_limit"
        args:
          max_seconds: 30

Run with:

testmcpy run tests/chart_tests.yaml --model claude-haiku-4-5

LLM Providers

Anthropic (Recommended)

Best tool-calling accuracy, supports HTTP MCP services:

# Add to ~/.testmcpy
ANTHROPIC_API_KEY=sk-ant-your-key
DEFAULT_PROVIDER=anthropic
DEFAULT_MODEL=claude-haiku-4-5  # Fast & cost-effective

Models: claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-1

Ollama (Free, Local)

For development without API costs:

# Install and start Ollama
brew install ollama  # or: curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollama pull llama3.1:8b

# Configure testmcpy
echo "DEFAULT_PROVIDER=ollama" >> ~/.testmcpy
echo "DEFAULT_MODEL=llama3.1:8b" >> ~/.testmcpy

OpenAI

OPENAI_API_KEY=sk-your-key
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4-turbo

Built-in Evaluators

Generic Evaluators

was_mcp_tool_called - Verify specific MCP tool was invoked
execution_successful - Check for errors or failures
final_answer_contains - Validate response content
within_time_limit - Performance testing
token_usage_reasonable - Cost efficiency validation

Superset/Preset Evaluators

was_superset_chart_created - Verify chart creation
sql_query_valid - Validate SQL syntax

Extensible: Add custom evaluators for your MCP service.

Commands

Command	Description
`testmcpy setup`	Interactive configuration wizard
`testmcpy tools`	List available MCP tools
`testmcpy research`	Test LLM tool-calling capabilities
`testmcpy run`	Execute test suite
`testmcpy chat`	Interactive chat with MCP tools
`testmcpy serve`	Start web UI server
`testmcpy report`	Compare test results across models
`testmcpy config-cmd`	View current configuration
`testmcpy doctor`	Diagnose installation issues
`testmcpy --version`	Show version

Web Interface

The optional web UI provides:

Visual MCP tool explorer
Interactive chat interface
Test management and execution
Real-time results display

# Install web UI dependencies
pip install 'testmcpy[server]'

# Start server
testmcpy serve

Access at http://localhost:8000

Use Cases

LLM Benchmarking: Compare Claude, GPT-4, Llama tool-calling accuracy
MCP Service Testing: Validate your MCP integrations
Cost Optimization: Find the best price/performance balance
Regression Testing: Ensure MCP tools work across updates
Model Selection: Make data-driven decisions about which LLM to use

Requirements

Python: 3.9 - 3.12 (3.13+ not yet supported)
Virtual Environment: Recommended
Operating Systems: macOS, Linux, Windows (WSL recommended)

Optional Dependencies

pip install 'testmcpy[server]'  # Web UI (FastAPI, uvicorn)
pip install 'testmcpy[sdk]'     # Claude Agent SDK
pip install 'testmcpy[dev]'     # Development tools
pip install 'testmcpy[all]'     # Everything

Project Structure

testmcpy/
├── testmcpy/
│   ├── cli.py              # CLI interface
│   ├── config.py           # Configuration management
│   ├── src/                # Core modules
│   │   ├── mcp_client.py   # MCP protocol client
│   │   ├── llm_integration.py  # LLM provider abstraction
│   │   └── test_runner.py  # Test execution engine
│   ├── evals/              # Evaluation functions
│   │   └── base_evaluators.py
│   ├── server/             # Web UI backend (optional)
│   │   ├── api.py
│   │   └── websocket.py
│   └── ui/                 # React web UI (optional)
│       ├── src/
│       └── dist/
├── tests/                  # Test case definitions
├── reports/                # Test results
└── README.md

Development

# Clone repository
git clone https://github.com/preset-io/testmcpy.git
cd testmcpy

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
pip install -e '.[dev]'

# Run tests
pytest

# Format code
black .

# Type checking
mypy testmcpy

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

When contributing:

Use type hints and async/await patterns
Follow Black code formatting
Add tests for new features
Document changes in README
Ensure multi-provider compatibility

License

Apache License 2.0 - See LICENSE for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: docs/

Acknowledgments

Built by the team at Preset for testing LLM integrations with Apache Superset and beyond.

Made with ❤️ by Preset

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.4

May 21, 2026

0.7.3

May 8, 2026

0.7.2

May 6, 2026

0.7.1

May 6, 2026

0.7.0

May 5, 2026

0.6.1

May 5, 2026

0.5.1

May 5, 2026

0.5.0

May 4, 2026

0.4.0

May 2, 2026

0.3.2

Apr 23, 2026

0.3.1

Apr 22, 2026

0.3.0

Apr 17, 2026

0.2.17

Dec 19, 2025

0.2.16

Dec 19, 2025

0.2.15

Dec 19, 2025

0.2.14

Dec 19, 2025

0.2.13

Dec 19, 2025

0.2.12

Dec 19, 2025

0.2.11

Dec 18, 2025

0.2.10

Dec 18, 2025

0.2.9

Dec 18, 2025

0.2.8

Dec 18, 2025

0.2.7

Dec 18, 2025

0.2.6

Dec 18, 2025

0.2.4

Nov 4, 2025

0.2.3

Nov 1, 2025

0.2.2

Nov 1, 2025

0.2.1

Nov 1, 2025

This version

0.2.0

Oct 18, 2025

0.1.15

Oct 17, 2025

0.1.13

Oct 17, 2025

0.1.12

Oct 17, 2025

0.1.11

Oct 17, 2025

0.1.10

Oct 17, 2025

0.1.9

Oct 17, 2025

0.1.8

Oct 17, 2025

0.1.7

Oct 17, 2025

0.1.6

Oct 17, 2025

0.1.5

Oct 17, 2025

0.1.4

Oct 17, 2025

0.1.3

Oct 16, 2025

0.1.2

Oct 16, 2025

0.1.1

Oct 16, 2025

0.1.0

Oct 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testmcpy-0.2.0.tar.gz (215.9 kB view details)

Uploaded Oct 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

testmcpy-0.2.0-py3-none-any.whl (219.0 kB view details)

Uploaded Oct 18, 2025 Python 3

File details

Details for the file testmcpy-0.2.0.tar.gz.

File metadata

Download URL: testmcpy-0.2.0.tar.gz
Upload date: Oct 18, 2025
Size: 215.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`fccd4633fc7257fdb21e3a9721eddde292267eee7f10885ceec51c48e9275078`
MD5	`779f58fc8812fba6a3636e1fedefa857`
BLAKE2b-256	`612f37054299a907ee99f991e6250a02ec10b7cc05503d608fbd6a9523f20e2f`

See more details on using hashes here.

File details

Details for the file testmcpy-0.2.0-py3-none-any.whl.

File metadata

Download URL: testmcpy-0.2.0-py3-none-any.whl
Upload date: Oct 18, 2025
Size: 219.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c411df992149601f78e7be4a8bde6476262e689f0190b6d6f82623f59797ed7`
MD5	`7a82dd50663071a06c242d3be3ee8593`
BLAKE2b-256	`fac114c301326b7cdd32803ccb54303b886671adf78d7337a816d45d29f03044`

See more details on using hashes here.

testmcpy 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

testmcpy

Features

Architecture

Quick Start

Installation

First-Time Setup

Basic Usage

Configuration

Example Configuration (~/.testmcpy)

Test Cases

LLM Providers

Anthropic (Recommended)

Ollama (Free, Local)

OpenAI

Built-in Evaluators

Generic Evaluators

Superset/Preset Evaluators

Commands

Web Interface

Use Cases

Requirements

Optional Dependencies

Project Structure

Development

Contributing

License

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Example Configuration (`~/.testmcpy`)