LiveMCPBench is a benchmark for evaluating the ability of agents to navigate and utilize a large-scale MCP toolset. It provides a comprehensive set of tasks that challenge agents to effectively use various tools in daily scenarios.
Project description
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Benchmarking the agent in real-world tasks within a large-scale MCP toolset.
🌐 Website | 📄 Paper | 🤗 Dataset | 🐳 Docker | 🏆 Leaderboard | 🙏 Citation
News
- [8/18/2025] We releas Docker images and add evaluation results in leaderboard for three new models: GLM 4.5, GPT-5-Mini, and Kimi-K2.
- [8/3/2025] We release the LiveMCPBench.
Getting Started
Prerequisites
We recommend using our docker image, but if you want to run the code locally, you will need to install the following tools:
- npm
- uv
Installation
-
Pull the docker image
docker pull hysdhlx/livemcpbench:latest
-
Git the repo and run the docker image
git clone https://github.com/icip-cas/LiveMCPBench.git cd LiveMCPBench docker run -itd \ -v "$(pwd):/outside" \ --gpus all \ --ipc=host \ --net=host \ --name LiveMCPBench_container \ hysdhlx/livemcpbench:latest \ bash
-
Prepare the .env file
cp .env_template .env
You can modify the .env file to set your own environment variables.
# MCP Copilot Agent Configuration BASE_URL= OPENAI_API_KEY= MODEL= # Tool Retrieval Configuration EMBEDDING_MODEL= EMBEDDING_BASE_URL= EMBEDDING_API_KEY= EMBEDDING_DIMENSIONS=1024 TOP_SERVERS=5 TOP_TOOLS=3 # Abstract API Configuration (optional) ABSTRACT_MODEL= ABSTRACT_API_KEY= ABSTRACT_BASE_URL= # Proxy Configuration (optional) http_proxy= https_proxy= no_proxy=127.0.0.1,localhost HTTP_PROXY= HTTPS_PROXY= NO_PROXY=127.0.0.1,localhost # lark report (optional) LARK_WEBHOOK_URL=
-
Enter the container & Reset the environment
As we have mounted the code repo to
/outside, you can access the code repo in the container at/outside/.docker exec -it LiveMCPBench_container bash
Because the agent may change the environment, we recommend resetting the environment before running the agent. To reset the environment, you can run the following command:
cd /LiveMCPBench/ bash scripts/env_reset.sh
This will copy the repo code in
/outsideto/LiveMCPBenchand link theannotated_datato/root/. -
Check the MCP tools
bash ./tools/scripts/tool_check.shAfter running this command, you can check
./tools/test/tools.jsonto see the tools.You could run this script multiple times if you find some tools are not working.
-
Index the servers
The MCP Copilot Agent requires you have indexed the servers before running. You can run the following command to warm up the agent:
uv run -m baseline.mcp_copilot.arg_generation
Quick Start
MCP Copilot Agent
Example Run
bash ./baseline/scripts/run_example.sh
This will run the agent with a simple example and save the results in ./baseline/output/.
Full Run
We default use /root dir to store our data that the agent will access. If you want to run locally, you need to ensure the file in the right path.
-
Run the MCP Copilot Agent
Be sure you have set the environment variables in the .env file.
bash ./baseline/scripts/run_baselines.sh -
Check the results
After running the agent, you can check the trajectories in
./baseline/output.
Evaluation using the LiveMCPEval
-
Modify the
MODELin .env to change evluation models -
Run the evaluation script
bash ./evaluator/scripts/run_baseline.sh -
Check the results
After running the evaluation, you can check the results in
./evaluator/output. -
Calculate the success rate
uv run ./evaluator/stat_success_rate.py --result_path /path/to/evaluation/
Project Structure
LiveMCPBench/
├── annotated_data/ # Tasks and task files
├── baseline/ # MCP Copilot Agent
│ ├── scripts/ # Scripts for running the agent
│ ├── output/ # Output for the agent
│ └── mcp_copilot/ # Source code for the agent
├── evaluator/ # LiveMCPEval
│ ├── scripts/ # Scripts for evaluation
│ └── output/ # Output for evaluation
├── tools/ # LiveMCPTool
│ ├── LiveMCPTool/ # Tool data
│ └── scripts/ # Scripts for the tools
├── scripts/ # Path prepare scripts
├── utils/ # Utility functions
└── .env_template # Template for environment
Citation
If you find this project helpful, please use the following to cite it:
@misc{mo2025livemcpbenchagentsnavigateocean,
title={LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?},
author={Guozhao Mo and Wenliang Zhong and Jiawei Chen and Xuanang Chen and Yaojie Lu and Hongyu Lin and Ben He and Xianpei Han and Le Sun},
year={2025},
eprint={2508.01780},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.01780},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow_mcp_icip_cas_livemcpbench-0.1.0.tar.gz.
File metadata
- Download URL: iflow_mcp_icip_cas_livemcpbench-0.1.0.tar.gz
- Upload date:
- Size: 29.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1086976f029f282503fa2a4f5f19b242d35c64b9a229e30fc91170b784c7230
|
|
| MD5 |
97ae9c319e6aca2120e3420fb360c085
|
|
| BLAKE2b-256 |
92e9884cfb17ff150a5d0d3fbd1d8737456aedb9d440615af2c6e93dba79109c
|
File details
Details for the file iflow_mcp_icip_cas_livemcpbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iflow_mcp_icip_cas_livemcpbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29b0c6150e97608a113bcc6e6f62a970252f9f82db53c1d2b5a527d4877a654b
|
|
| MD5 |
4da51c6ada1870a54233c4ec80093a4e
|
|
| BLAKE2b-256 |
413d819243b77d710577eeb63c677189c7537ca54023b8c0b96a7d26581deb58
|