Open ended tool use evaluation framework

Project description

mcpx-eval

A framework for evaluating open-ended tool use across various large language models.

mcpx-eval can be used to compare the output of different LLMs with the same prompt for a given task using mcp.run tools. This means we're not only interested in the quality of the output, but also curious about the helpfulness of various models when presented with real world tools.

Test configs

The tests/ directory contains pre-defined evals

Installation

uv tool install mcpx-eval

Or from git:

uv tool install git+https://github.com/dylibso/mcpx-eval

Usage

Run the my-test test for 10 iterations:

mcpx-eval test --model ... --model ... --config my-test.toml --iter 10

Or run a task directly from mcp.run:

mcpx-eval test --model .. --model .. --task my-task --iter 10

Generate an HTML scoreboard for all evals:

mcpx-eval gen --html results.html --show

Test file

A test file is a TOML file containing the following fields:

name - name of the test
task - optional, the name of the mcp.run task to use
prompt - prompt to test, this is passed to the LLM under test, this can be left blank if task is set
check - prompt for the judge, this is used to determine the quality of the test output
expected-tools - list of tool names that might be used
ignore-tools - optional, list of tools to ignore, they will not be available to the LLM
import - optional, includes fields from another test TOML file
vars - optional, a dict of variables that will be used to format the prompt

Project details

Release history Release notifications | RSS feed

0.4.3

May 21, 2025

0.4.2

May 21, 2025

0.4.1

May 12, 2025

0.4.0

May 8, 2025

0.3.0

Apr 16, 2025

0.2.1

Mar 27, 2025

0.2.0

Mar 20, 2025

0.1.4

Mar 20, 2025

0.1.3

Mar 19, 2025

0.1.2

Mar 19, 2025

This version

0.1.1

Mar 19, 2025

0.1.0

Mar 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcpx_eval-0.1.1.tar.gz (23.6 kB view details)

Uploaded Mar 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcpx_eval-0.1.1-py3-none-any.whl (22.8 kB view details)

Uploaded Mar 19, 2025 Python 3

File details

Details for the file mcpx_eval-0.1.1.tar.gz.

File metadata

Download URL: mcpx_eval-0.1.1.tar.gz
Upload date: Mar 19, 2025
Size: 23.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.6.8

File hashes

Hashes for mcpx_eval-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2797d019da6fc914e11a6bf4e417e470f70ed0066eefdd2a83f11cc90813657e`
MD5	`5e6492437fb9e17ea7c544cd45d72953`
BLAKE2b-256	`5a4d89c4aaec7dd8aa18c72a742f18f55373b98813674d32dcdc9912f76fe921`

See more details on using hashes here.

File details

Details for the file mcpx_eval-0.1.1-py3-none-any.whl.

File metadata

Download URL: mcpx_eval-0.1.1-py3-none-any.whl
Upload date: Mar 19, 2025
Size: 22.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.6.8

File hashes

Hashes for mcpx_eval-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3797c5ced86d0c8acac573600e8570faa7c78e1aa2ae1a1ffc2e80d00f8b6f26`
MD5	`4fe7c9cbc323e67995e91e7166af5178`
BLAKE2b-256	`a5f4f6abaa2987b131bbdaeae3d268b0c5ae740d24c8533c8ca6f3a450d3929d`

See more details on using hashes here.

mcpx-eval 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

mcpx-eval

Test configs

Installation

Usage

Test file

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes