A comprehensive testing framework for validating LLM tool calling capabilities with MCP services

These details have not been verified by PyPI

Project links

Project description

testmcpy

Test and benchmark LLMs with MCP tools in minutes.

A testing framework for validating how LLMs call tools via Model Context Protocol (MCP) - compare Claude, GPT-4, Llama, and other models' accuracy, cost, and performance.

[Screenshot: CLI test runner with colorful progress bars and results]

[Screenshot: Web UI showing tool explorer and interactive chat]

[GIF: Running a test suite from command line with real-time progress]

Documentation • Examples • Contributing • Discussions

Why testmcpy?

Validate tool calling: Ensure LLMs call the right tools with correct parameters
Compare models: Find the best price/performance balance for your use case
Prevent regressions: Catch breaking changes in your MCP service with CI/CD
Optimize costs: Track token usage and identify the most cost-effective models

Quick Start

# Install testmcpy
pip install testmcpy

# Run interactive setup
testmcpy setup

# Start testing
testmcpy chat                     # Interactive chat with MCP tools
testmcpy research                 # Test LLM tool-calling capabilities
testmcpy run tests/              # Run your test suite

That's it! No complex configuration needed to get started.

Key Features

Multi-Provider Support

Test with Claude, GPT-4, Llama, and other models. Works with both paid APIs and free local models via Ollama.

[Screenshot: Model selector showing Claude, GPT-4, and Ollama options]

Built-in Evaluators

Comprehensive validation out of the box:

Tool Selection: Did the LLM call the right tool?
Parameter Validation: Were correct parameters passed?
Execution Success: Did the tool call complete without errors?
Performance: Response time and token usage tracking
Cost Analysis: Monitor API costs across test runs

[Screenshot: Test results showing pass/fail for different evaluators]

Beautiful CLI & Web UI

Rich terminal UI: Progress bars, colored output, formatted tables
Optional web interface: Visual tool explorer and interactive chat
Real-time feedback: Watch tests execute with live updates

[Screenshot: Split view of CLI and Web UI running the same test]

YAML Test Definitions

Define test suites as code for repeatable, version-controlled testing:

version: "1.0"
name: "Chart Operations Test Suite"

tests:
  - name: "test_create_chart"
    prompt: "Create a bar chart showing sales by region"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"

Use Cases

Perfect for:

LLM Benchmarking: Compare tool-calling accuracy across Claude, GPT-4, and Llama
MCP Service Testing: Validate your MCP integrations work correctly
Regression Prevention: Catch breaking changes in CI/CD pipelines
Model Selection: Make data-driven decisions about which LLM to use
Cost Optimization: Find the best price/performance balance for your workload
Parameter Validation: Ensure LLMs pass correct parameters to your tools

Architecture

testmcpy connects your LLM provider to your MCP service and validates the interactions:

graph TB
    subgraph "CLI Interface"
        CLI[testmcpy CLI]
        WebUI[Web UI - Optional]
    end

    subgraph "Core Framework"
        TestRunner[Test Runner]
        Evaluators[Evaluators]
        Config[Configuration Manager]
    end

    subgraph "LLM Providers"
        Anthropic[Anthropic API]
        OpenAI[OpenAI API]
        Ollama[Ollama Local]
    end

    subgraph "MCP Integration"
        MCPClient[MCP Client]
        MCPService[MCP Service<br/>HTTP/SSE]
    end

    CLI --> TestRunner
    WebUI --> TestRunner
    TestRunner --> Config
    TestRunner --> Evaluators
    TestRunner --> Anthropic
    TestRunner --> OpenAI
    TestRunner --> Ollama
    Anthropic --> MCPClient
    OpenAI --> MCPClient
    Ollama --> MCPClient
    MCPClient --> MCPService

    style CLI fill:#4A90E2
    style WebUI fill:#4A90E2
    style TestRunner fill:#50E3C2
    style MCPClient fill:#F5A623
    style MCPService fill:#BD10E0

How it works:

Define test cases in YAML with prompts and expected behavior
testmcpy sends prompts to your chosen LLM (Claude, GPT-4, Llama, etc.)
LLM calls tools via MCP protocol to your service
Evaluators validate tool selection, parameters, execution, and performance
Get detailed pass/fail results with metrics and cost analysis

Installation

# Install base package
pip install testmcpy

# With web UI support
pip install 'testmcpy[server]'

# All optional features
pip install 'testmcpy[all]'

Requirements: Python 3.9-3.12 (3.13+ not yet supported)

Getting Started

1. Configuration

Run the interactive setup wizard:

testmcpy setup

Or manually create ~/.testmcpy:

# MCP Service
MCP_URL=http://localhost:5008/mcp/
MCP_AUTH_TOKEN=your_bearer_token

# LLM Provider (choose one)
DEFAULT_PROVIDER=anthropic
DEFAULT_MODEL=claude-haiku-4-5
ANTHROPIC_API_KEY=sk-ant-...

Configuration priority: CLI options > .env > ~/.testmcpy > Environment variables > Defaults

2. Test Your MCP Service

# List available MCP tools
testmcpy tools

# Interactive chat to explore your tools
testmcpy chat

# Run automated research on tool-calling capabilities
testmcpy research --model claude-haiku-4-5

3. Create Test Suites

Define tests in YAML (tests/my_tests.yaml):

version: "1.0"
name: "My MCP Service Tests"

tests:
  - name: "test_tool_selection"
    prompt: "Create a bar chart showing sales by region"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"
      - name: "within_time_limit"
        args:
          max_seconds: 30

Run your tests:

testmcpy run tests/ --model claude-haiku-4-5

Documentation

Core Guides

Evaluator Reference - All available evaluators and usage examples
Client Usage Guide - Complete guide for testing your MCP service
MCP Profiles - Managing multiple MCP service configurations

Examples

Basic Tests - Simple test cases to get started
CI/CD Integration - GitHub Actions and GitLab CI configurations
Custom Evaluators - Building your own validation logic

Commands Reference

Command	Description
`testmcpy setup`	Interactive configuration wizard
`testmcpy tools`	List available MCP tools
`testmcpy research`	Test LLM tool-calling capabilities
`testmcpy run <path>`	Execute test suite
`testmcpy chat`	Interactive chat with MCP tools
`testmcpy serve`	Start web UI server
`testmcpy report`	Compare test results across models
`testmcpy config-cmd`	View current configuration
`testmcpy doctor`	Diagnose installation issues

LLM Providers

Anthropic (Recommended)

Best tool-calling accuracy, native MCP support:

ANTHROPIC_API_KEY=sk-ant-your-key
DEFAULT_MODEL=claude-haiku-4-5  # Fast & cost-effective

Available models: claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-1

Ollama (Free, Local)

Perfect for development without API costs:

# Install Ollama
brew install ollama  # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama and pull a model
ollama serve
ollama pull llama3.1:8b

# Configure testmcpy
DEFAULT_PROVIDER=ollama
DEFAULT_MODEL=llama3.1:8b

OpenAI

OPENAI_API_KEY=sk-your-key
DEFAULT_MODEL=gpt-4-turbo

Built-in Evaluators

testmcpy includes comprehensive evaluators for validating LLM behavior:

Tool Calling

was_mcp_tool_called - Verify specific tool was invoked
tool_call_count - Validate number of tool calls
tool_called_with_parameter - Check specific parameter was passed
tool_called_with_parameters - Validate multiple parameters
parameter_value_in_range - Ensure numeric parameters are valid

Execution

execution_successful - Check for errors or failures
within_time_limit - Performance validation
final_answer_contains - Validate response content

Cost & Performance

token_usage_reasonable - Cost efficiency validation
Performance metrics automatically tracked

Extensible: Easily add custom evaluators for your domain-specific needs.

See Evaluator Reference for complete documentation.

For MCP Service Developers

Integrate testmcpy into your MCP service for automated testing:

# Install testmcpy in your project
pip install testmcpy[all]

# Create tests for your MCP tools
cat > tests/my_service_tests.yaml <<EOF
version: "1.0"
name: "My MCP Service Tests"
tests:
  - name: "test_tool_selection"
    prompt: "List all items"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "list_items"
      - name: "execution_successful"
EOF

# Run tests in CI/CD
testmcpy run tests/ --model claude-haiku-4-5

Client Usage Guide - Complete integration guide for your MCP service

CI/CD Examples - GitHub Actions and GitLab CI configurations

Web Interface

Optional React-based UI for visual testing:

[Screenshot: Web UI dashboard with tool explorer]

# Install with UI support
pip install 'testmcpy[server]'

# Start server
testmcpy serve

Features:

Visual MCP tool explorer
Interactive chat interface
Test management and execution
Real-time results display

Access at http://localhost:8000

Examples

Check out the examples/ directory for:

Basic test suites - Simple examples to get started
CI/CD integration - GitHub Actions and GitLab CI workflows
Custom evaluators - Building domain-specific validation
Multi-model comparison - Benchmarking different LLMs

Contributing

We welcome contributions! Whether it's bug reports, feature requests, documentation improvements, or code contributions.

Read the Contributing Guide to get started.

Quick guidelines:

Follow Black code formatting (100 char line length)
Add tests for new features
Ensure multi-provider compatibility (test with Ollama, Claude, GPT)
Document your changes
Be respectful and collaborative

Contributors

Built with contributions from:

Want to see your name here? Check out our Contributing Guide!

Community & Support

Issues: Report bugs or request features
Discussions: Ask questions and share ideas
Documentation: Browse the docs/ directory
Examples: Explore examples/ for sample code

License

Apache License 2.0 - See LICENSE for details.

By contributing, you agree that your contributions will be licensed under Apache 2.0.

Acknowledgments

Built by the team at Preset to enable better LLM testing and integration with Apache Superset and beyond.

Special thanks to the MCP community and all our contributors!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.4

May 21, 2026

0.7.3

May 8, 2026

0.7.2

May 6, 2026

0.7.1

May 6, 2026

0.7.0

May 5, 2026

0.6.1

May 5, 2026

0.5.1

May 5, 2026

0.5.0

May 4, 2026

0.4.0

May 2, 2026

0.3.2

Apr 23, 2026

0.3.1

Apr 22, 2026

0.3.0

Apr 17, 2026

0.2.17

Dec 19, 2025

0.2.16

Dec 19, 2025

0.2.15

Dec 19, 2025

0.2.14

Dec 19, 2025

0.2.13

Dec 19, 2025

0.2.12

Dec 19, 2025

0.2.11

Dec 18, 2025

0.2.10

Dec 18, 2025

0.2.9

Dec 18, 2025

0.2.8

Dec 18, 2025

0.2.7

Dec 18, 2025

0.2.6

Dec 18, 2025

0.2.4

Nov 4, 2025

This version

0.2.3

Nov 1, 2025

0.2.2

Nov 1, 2025

0.2.1

Nov 1, 2025

0.2.0

Oct 18, 2025

0.1.15

Oct 17, 2025

0.1.13

Oct 17, 2025

0.1.12

Oct 17, 2025

0.1.11

Oct 17, 2025

0.1.10

Oct 17, 2025

0.1.9

Oct 17, 2025

0.1.8

Oct 17, 2025

0.1.7

Oct 17, 2025

0.1.6

Oct 17, 2025

0.1.5

Oct 17, 2025

0.1.4

Oct 17, 2025

0.1.3

Oct 16, 2025

0.1.2

Oct 16, 2025

0.1.1

Oct 16, 2025

0.1.0

Oct 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testmcpy-0.2.3.tar.gz (127.8 kB view details)

Uploaded Nov 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

testmcpy-0.2.3-py3-none-any.whl (133.6 kB view details)

Uploaded Nov 1, 2025 Python 3

File details

Details for the file testmcpy-0.2.3.tar.gz.

File metadata

Download URL: testmcpy-0.2.3.tar.gz
Upload date: Nov 1, 2025
Size: 127.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for testmcpy-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`bc6a84f41dbc7e564f532e847cbb7a02f5c7855e1d3dc0fb75384fc1c9a5ce77`
MD5	`cde0e3326c64b3289e05dc98147d81a9`
BLAKE2b-256	`8c8b8578c5437337d2d0d03e1f8d5663b0721663f6efca20d52bb67a2470cbcb`

See more details on using hashes here.

File details

Details for the file testmcpy-0.2.3-py3-none-any.whl.

File metadata

Download URL: testmcpy-0.2.3-py3-none-any.whl
Upload date: Nov 1, 2025
Size: 133.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for testmcpy-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8ce861fd43e4667f840228e835686e74cfc597de72c7ceec26c04128f4afd86`
MD5	`fe01b1540de118d0698a934856f7fadb`
BLAKE2b-256	`651def48a4cc7136f4f2766046285aa82c0e633dc51e1b9127b8fbf453c9bf77`

See more details on using hashes here.

testmcpy 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

testmcpy

Why testmcpy?

Quick Start

Key Features

Multi-Provider Support

Built-in Evaluators

Beautiful CLI & Web UI

YAML Test Definitions

Use Cases

Architecture

Installation

Getting Started

1. Configuration

2. Test Your MCP Service

3. Create Test Suites

Documentation

Core Guides

Examples

Commands Reference

LLM Providers

Anthropic (Recommended)

Ollama (Free, Local)

OpenAI

Built-in Evaluators

Tool Calling

Execution

Cost & Performance

For MCP Service Developers

Web Interface

Examples

Contributing

Contributors

Community & Support

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes