iflow-mcp_icip-cas-livemcpbench

LiveMCPBench is a benchmark for evaluating the ability of agents to navigate and utilize a large-scale MCP toolset. It provides a comprehensive set of tasks that challenge agents to effectively use various tools in daily scenarios.

Project description

LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

Benchmarking the agent in real-world tasks within a large-scale MCP toolset.

Overview

News

[8/18/2025] We releas Docker images and add evaluation results in leaderboard for three new models: GLM 4.5, GPT-5-Mini, and Kimi-K2.
[8/3/2025] We release the LiveMCPBench.

Getting Started

Prerequisites

We recommend using our docker image, but if you want to run the code locally, you will need to install the following tools:

Installation

Pull the docker image

docker pull hysdhlx/livemcpbench:latest

Git the repo and run the docker image

git clone https://github.com/icip-cas/LiveMCPBench.git
cd LiveMCPBench

docker run -itd \
-v "$(pwd):/outside" \
--gpus all \
--ipc=host \
--net=host \
--name LiveMCPBench_container \
hysdhlx/livemcpbench:latest \
bash

Prepare the .env file

cp .env_template .env

You can modify the .env file to set your own environment variables.

# MCP Copilot Agent Configuration
 BASE_URL=
 OPENAI_API_KEY=
 MODEL=

 # Tool Retrieval Configuration
 EMBEDDING_MODEL=
 EMBEDDING_BASE_URL=
 EMBEDDING_API_KEY=
 EMBEDDING_DIMENSIONS=1024
 TOP_SERVERS=5
 TOP_TOOLS=3
 # Abstract API Configuration (optional)
 ABSTRACT_MODEL=
 ABSTRACT_API_KEY=
 ABSTRACT_BASE_URL=

 # Proxy Configuration (optional)
 http_proxy=
 https_proxy=
 no_proxy=127.0.0.1,localhost
 HTTP_PROXY=
 HTTPS_PROXY=
 NO_PROXY=127.0.0.1,localhost

 # lark report (optional)
 LARK_WEBHOOK_URL=

Enter the container & Reset the environment

As we have mounted the code repo to /outside, you can access the code repo in the container at /outside/.
```
docker exec -it LiveMCPBench_container bash
```
Because the agent may change the environment, we recommend resetting the environment before running the agent. To reset the environment, you can run the following command:
```
cd /LiveMCPBench/
bash scripts/env_reset.sh 
```
This will copy the repo code in /outside to /LiveMCPBench and link the annotated_data to /root/.
Check the MCP tools
```
bash ./tools/scripts/tool_check.sh
```
After running this command, you can check ./tools/test/tools.json to see the tools.

You could run this script multiple times if you find some tools are not working.
Index the servers

The MCP Copilot Agent requires you have indexed the servers before running. You can run the following command to warm up the agent:
```
uv run -m baseline.mcp_copilot.arg_generation
```

Quick Start

MCP Copilot Agent

Example Run

bash ./baseline/scripts/run_example.sh

This will run the agent with a simple example and save the results in ./baseline/output/.

Full Run

We default use /root dir to store our data that the agent will access. If you want to run locally, you need to ensure the file in the right path.

Run the MCP Copilot Agent

Be sure you have set the environment variables in the .env file.
```
bash ./baseline/scripts/run_baselines.sh
```
Check the results

After running the agent, you can check the trajectories in ./baseline/output.

Evaluation using the LiveMCPEval

Modify the MODEL in .env to change evluation models

Run the evaluation script

bash ./evaluator/scripts/run_baseline.sh

Check the results

After running the evaluation, you can check the results in ./evaluator/output.

Calculate the success rate

uv run ./evaluator/stat_success_rate.py --result_path /path/to/evaluation/

Project Structure

LiveMCPBench/
├── annotated_data/      # Tasks and task files
├── baseline/            # MCP Copilot Agent
│   ├── scripts/         # Scripts for running the agent
│   ├── output/          # Output for the agent
│   └── mcp_copilot/     # Source code for the agent
├── evaluator/           # LiveMCPEval
│   ├── scripts/         # Scripts for evaluation
│   └── output/          # Output for evaluation
├── tools/               # LiveMCPTool
│   ├── LiveMCPTool/     # Tool data
│   └── scripts/         # Scripts for the tools
├── scripts/             # Path prepare scripts
├── utils/               # Utility functions
└── .env_template        # Template for environment

Citation

If you find this project helpful, please use the following to cite it:

@misc{mo2025livemcpbenchagentsnavigateocean,
      title={LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?}, 
      author={Guozhao Mo and Wenliang Zhong and Jiawei Chen and Xuanang Chen and Yaojie Lu and Hongyu Lin and Ben He and Xianpei Han and Le Sun},
      year={2025},
      eprint={2508.01780},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.01780}, 
}

Project details

Release history Release notifications | RSS feed

0.1.1

Feb 8, 2026

This version

0.1.0

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_icip_cas_livemcpbench-0.1.0.tar.gz (29.7 MB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iflow_mcp_icip_cas_livemcpbench-0.1.0-py3-none-any.whl (13.0 MB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file iflow_mcp_icip_cas_livemcpbench-0.1.0.tar.gz.

File metadata

Download URL: iflow_mcp_icip_cas_livemcpbench-0.1.0.tar.gz
Upload date: Feb 8, 2026
Size: 29.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_icip_cas_livemcpbench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d1086976f029f282503fa2a4f5f19b242d35c64b9a229e30fc91170b784c7230`
MD5	`97ae9c319e6aca2120e3420fb360c085`
BLAKE2b-256	`92e9884cfb17ff150a5d0d3fbd1d8737456aedb9d440615af2c6e93dba79109c`

See more details on using hashes here.

File details

Details for the file iflow_mcp_icip_cas_livemcpbench-0.1.0-py3-none-any.whl.

File metadata

Download URL: iflow_mcp_icip_cas_livemcpbench-0.1.0-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 13.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_icip_cas_livemcpbench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29b0c6150e97608a113bcc6e6f62a970252f9f82db53c1d2b5a527d4877a654b`
MD5	`4da51c6ada1870a54233c4ec80093a4e`
BLAKE2b-256	`413d819243b77d710577eeb63c677189c7537ca54023b8c0b96a7d26581deb58`

See more details on using hashes here.

iflow-mcp_icip-cas-livemcpbench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

News

Getting Started

Prerequisites

Installation

Quick Start

MCP Copilot Agent

Example Run

Full Run

Evaluation using the LiveMCPEval

Project Structure

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes