A comprehensive testing framework for validating LLM tool calling capabilities with MCP services

These details have not been verified by PyPI

Project links

Project description

testmcpy - MCP Testing Framework

A comprehensive testing framework for validating LLM tool calling capabilities with MCP (Model Context Protocol) services, specifically designed for testing Superset operations.

Quick Start

Installation

From source (development):

git clone https://github.com/preset-io/testmcpy.git
cd testmcpy
pip install -e .

From PyPI (once published):

pip install testmcpy

Via Homebrew (once published to PyPI):

brew tap preset-io/testmcpy
brew install testmcpy

See INSTALLATION.md for detailed installation instructions and distribution options.

Quick Usage

# List MCP tools
testmcpy tools
testmcpy tools --detail --filter chart

# Research LLM capabilities
testmcpy research --model claude-sonnet-4.5-20250929 --provider anthropic

# Run test suites
testmcpy run tests/ --model claude-3-5-haiku-20241022 --provider anthropic

# Interactive chat
testmcpy chat --provider anthropic --model claude-sonnet-4.5-20250929

# Compare test results
testmcpy report reports/model1.yaml reports/model2.yaml

# Initialize new project
testmcpy init my_project

Framework Structure

mcp_testing/
├── research/               # Research scripts for testing LLM capabilities
│   └── test_ollama_tools.py
├── src/                    # Core framework modules
│   ├── mcp_client.py      # MCP protocol client
│   ├── llm_integration.py # LLM provider abstraction
│   └── test_runner.py     # Test execution engine
├── evals/                  # Evaluation functions
│   └── base_evaluators.py # Standard evaluators
├── tests/                  # Test cases (YAML/JSON)
│   ├── basic_test.yaml
│   └── example_mcp_tests.yaml
├── reports/                # Test reports and comparisons
└── cli.py                  # CLI interface

Writing Test Cases

Test cases are defined in YAML files:

version: "1.0"
name: "My Test Suite"

tests:
  - name: "test_chart_creation"
    prompt: "Create a bar chart showing sales by region"
    expected_tools:
      - "create_chart"
    evaluators:
      - name: "was_mcp_tool_called"
        args:
          tool_name: "create_chart"
      - name: "execution_successful"
      - name: "final_answer_contains"
        args:
          expected_content: ["chart", "created"]
      - name: "within_time_limit"
        args:
          max_seconds: 30

Available Evaluators

Generic Evaluators

was_mcp_tool_called - Verify MCP tool was called
execution_successful - Check for successful execution
final_answer_contains - Validate response content
answer_contains_link - Check for links in response
within_time_limit - Verify performance
token_usage_reasonable - Check token/cost efficiency

Superset-Specific Evaluators

was_superset_chart_created - Verify chart creation
sql_query_valid - Validate SQL syntax

Supported LLM Providers

Claude Agent SDK (claude-sdk) - Official Anthropic SDK ⚠️ Limited MCP Support
- claude-sonnet-4.5-20250929 (newest, most capable)
- claude-sonnet-4-20250514
- claude-3-5-sonnet-20241022
- claude-3-5-haiku-20241022
- All Claude models
- Requires: ANTHROPIC_API_KEY environment variable
- Features: Native tool calling, streaming, hooks
- Note: Designed for stdio-based MCP servers, not HTTP-based services
- For HTTP MCP (like Superset): Use anthropic provider instead
Anthropic API (anthropic) - Direct API integration ✅ Recommended for HTTP MCP
- claude-sonnet-4.5-20250929 (newest, recommended)
- claude-sonnet-4-20250514
- claude-3-5-sonnet-20241022
- claude-3-5-haiku-20241022 (fast, cost-effective)
- claude-3-opus-20240229
- All Claude models via API
- Requires: ANTHROPIC_API_KEY environment variable
- Full support for HTTP-based MCP services (like Superset MCP)
- Best choice for production testing with MCP tools
Ollama (ollama) - Local models with tool calling support
- llama3.1:8b (recommended)
- mistral-nemo
- qwen2.5:7b
OpenAI (openai) - GPT models via API
- Requires: OPENAI_API_KEY environment variable
Local (local) - Transformers-based local models
Claude CLI (claude-cli) - Claude Code CLI interface
- Uses Claude Code binary

Configuration

Environment Variables

# For Claude providers (claude-sdk, anthropic)
export ANTHROPIC_API_KEY="sk-ant-..."

# For OpenAI provider
export OPENAI_API_KEY="sk-..."

# MCP service URL (optional, defaults to http://localhost:5008/mcp/)
export MCP_URL="http://localhost:5008/mcp/"

# Default model and provider (optional)
export DEFAULT_MODEL="claude-sonnet-4.5-20250929"
export DEFAULT_PROVIDER="anthropic"

Configuration File

Create mcp_test_config.yaml:

mcp_url: "http://localhost:5008/mcp"
default_model: "claude-sonnet-4.5-20250929"
default_provider: "anthropic"
evaluators:
  timeout: 30
  max_tokens: 2000
  max_cost: 0.10

Development Status

Phase 0: Research & Prototype ✅

Research local LLM options with tool calling
Build minimal Python script for LLM+MCP integration
Validate tool calling with selected LLM
Create basic framework structure

Phase 1: Foundation (In Progress)

CLI framework with typer + rich
Basic test execution engine
MCP protocol client
LLM provider abstraction
Core evaluation functions
Integration with existing Superset tests

Phase 2: Core Features (Planned)

Multi-model comparison support
Advanced reporting with charts
Test suite versioning
Parallel test execution

Phase 3: Advanced Capabilities (Future)

CI/CD integration
Interactive test development mode
Performance profiling
Cost optimization insights

Known Limitations

Claude SDK Provider: Only supports stdio-based MCP servers (command-line tools)
- Not compatible with HTTP-based MCP services (like Superset MCP)
- Use anthropic provider for HTTP MCP services
HTTP MCP Services: Use anthropic provider (fully supported)
Ollama models: Require specific formatting for reliable tool calling
CPU-only execution: May be slow for larger local models
Tool calling accuracy: Varies by model (Claude models generally most reliable)
Cost: Claude API providers (anthropic) incur API costs; consider using Ollama for development

Contributing

This framework follows the patterns established by promptimize and superset-sup. When contributing:

Use modern Python practices (type hints, async/await)
Follow the existing code style
Add tests for new evaluators
Document new features in this README

License

Same as the parent promptimize project.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.4

May 21, 2026

0.7.3

May 8, 2026

0.7.2

May 6, 2026

0.7.1

May 6, 2026

0.7.0

May 5, 2026

0.6.1

May 5, 2026

0.5.1

May 5, 2026

0.5.0

May 4, 2026

0.4.0

May 2, 2026

0.3.2

Apr 23, 2026

0.3.1

Apr 22, 2026

0.3.0

Apr 17, 2026

0.2.17

Dec 19, 2025

0.2.16

Dec 19, 2025

0.2.15

Dec 19, 2025

0.2.14

Dec 19, 2025

0.2.13

Dec 19, 2025

0.2.12

Dec 19, 2025

0.2.11

Dec 18, 2025

0.2.10

Dec 18, 2025

0.2.9

Dec 18, 2025

0.2.8

Dec 18, 2025

0.2.7

Dec 18, 2025

0.2.6

Dec 18, 2025

0.2.4

Nov 4, 2025

0.2.3

Nov 1, 2025

0.2.2

Nov 1, 2025

0.2.1

Nov 1, 2025

0.2.0

Oct 18, 2025

0.1.15

Oct 17, 2025

0.1.13

Oct 17, 2025

0.1.12

Oct 17, 2025

0.1.11

Oct 17, 2025

0.1.10

Oct 17, 2025

0.1.9

Oct 17, 2025

0.1.8

Oct 17, 2025

0.1.7

Oct 17, 2025

0.1.6

Oct 17, 2025

0.1.5

Oct 17, 2025

0.1.4

Oct 17, 2025

0.1.3

Oct 16, 2025

0.1.2

Oct 16, 2025

0.1.1

Oct 16, 2025

This version

0.1.0

Oct 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testmcpy-0.1.0.tar.gz (49.6 kB view details)

Uploaded Oct 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

testmcpy-0.1.0-py3-none-any.whl (49.9 kB view details)

Uploaded Oct 16, 2025 Python 3

File details

Details for the file testmcpy-0.1.0.tar.gz.

File metadata

Download URL: testmcpy-0.1.0.tar.gz
Upload date: Oct 16, 2025
Size: 49.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`015e53ca6632b3a673a4ec867a8e4d15de3fa56eb9068f009ec09a8d742716fb`
MD5	`5f83143fac1a76b6aaa1f4ebb4e97b8e`
BLAKE2b-256	`4f63fb765886f52bf3caa98a6dac4404b2d97523de5713fbf59022e516198294`

See more details on using hashes here.

File details

Details for the file testmcpy-0.1.0-py3-none-any.whl.

File metadata

Download URL: testmcpy-0.1.0-py3-none-any.whl
Upload date: Oct 16, 2025
Size: 49.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for testmcpy-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ed40b350e97b11c3067aecdacdb7b774f46304a6699a3b9dfb0fc64cd4758b4f`
MD5	`8d3b76176a64e89404d04485092cd124`
BLAKE2b-256	`01403c0dff88856091c2bdf099ee82fc1cec599c96febdb1b645a183d1c3eadf`

See more details on using hashes here.

testmcpy 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

testmcpy - MCP Testing Framework

Quick Start

Installation

Quick Usage

Framework Structure

Writing Test Cases

Available Evaluators

Generic Evaluators

Superset-Specific Evaluators

Supported LLM Providers

Configuration

Environment Variables

Configuration File

Development Status

Phase 0: Research & Prototype ✅

Phase 1: Foundation (In Progress)

Phase 2: Core Features (Planned)

Phase 3: Advanced Capabilities (Future)

Known Limitations

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes