Skip to main content

Turn an OpenAPI spec into a high-quality, curated MCP server — with an eval harness that proves curation works.

Project description

mcp-curate

CI Python License: MIT

Turn an OpenAPI spec into a curated MCP server an LLM can actually use — and prove it with an eval.

A naive OpenAPI→MCP generator dumps one tool per endpoint. Point it at GitHub's API and the model drowns in 1190 tools and picks the wrong one. mcp-curate consolidates those endpoints into a small set of clear, well-described meta-tools — and ships an eval harness that measures whether the model picks the right tool, raw vs curated, on your own spec.

Before / after

Spec Raw tools Curated tools Reduction
Swagger Petstore 19 3 84%
Stripe API 587 40 93%
GitHub REST API 1190 40 97%
$ mcp-curate curate examples/github.json
raw tools:     1190
curated tools: 40  (budget 40)
reduction:     97%

Curated tools (actions consolidated):
  - repos: 202 actions  [repos]
  - actions: 187 actions  [actions]
  - orgs: 108 actions  [orgs]
  - issues: 55 actions  [issues]
  ...

Each curated tool exposes an action argument that selects the underlying operation, so 1190 flat choices become 40 namespaced ones.

Oversized tags get split, not stuffed. When the tool budget has headroom, a giant tag is broken into focused sub-tools by path instead of one bloated tool. With more budget, GitHub's 202-operation repos tag splits cleanly:

$ mcp-curate curate examples/github.json --max-tools 120 --max-actions 30
  - repos: ...            repos_branches, repos_commits, repos_collaborators,
  - repos_branches: 36    repos_comments, repos_compare, ... (focused sub-tools)

At a tight budget (the default 40), curation keeps tags whole and clean rather than forcing unrelated tags together; raise --max-tools to trade tool count for smaller, more focused tools.

Does curation actually help? (the eval)

mcp-curate eval runs natural-language requests against both the raw and the curated tool set using your LLM key, and reports how often the model routes to the correct tool.

$ export ANTHROPIC_API_KEY=...
$ mcp-curate eval examples/stripe.json --cases examples/eval_cases/stripe.yaml

Eval: raw vs curated tool selection
cases: 11   raw tools: 587   curated tools: 40

raw     correct-tool selection: <run it>%
curated correct-tool selection: <run it>%
  -> improvement: <run it> points

The harness uses your key on your spec, so the numbers aren't hard-coded — run the command above to reproduce them. Golden sets ship for Petstore and Stripe (examples/eval_cases/); add your own as a small YAML file.

The eval is deliberately honest. Beyond correct-tool selection it also reports:

  • curated tool + action accuracy — so curation can't "win" just by offering fewer, broader tools (it must still route to the right operation);
  • argument construction accuracy (raw vs curated) — for cases that declare expected arguments, whether the model filled the right parameters (e.g. petId: 42 from "look up pet 42").

Forking this repo? The status badges above point to tarundattagondi/mcp-curate. Replace that with your-username/mcp-curate in the three badge URLs at the top so they track your own fork's CI.

Install

git clone https://github.com/tarundattagondi/mcp-curate && cd mcp-curate
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,llm]"
./examples/fetch_specs.sh        # petstore is committed; this also grabs GitHub + Stripe

Usage

# Inspect a spec's raw tool count.
mcp-curate parse examples/petstore.json

# See the before/after curation report.
mcp-curate curate examples/github.json --max-tools 40

# Serve the curated MCP server over stdio (bring-your-own auth header).
mcp-curate serve examples/petstore.json --curated \
  --header "Authorization: Bearer $TOKEN"

# A/B the tool selection with your LLM key.
mcp-curate eval examples/petstore.json --cases examples/eval_cases/petstore.yaml

Add --llm-descriptions to curate/serve/eval to let the LLM polish the curated tool names and descriptions (otherwise they're generated deterministically, with no API key required).

How it works

  1. Parse — load OpenAPI 3.x (JSON/YAML), resolve $ref with cycle cutting, flatten each operation into a spec-agnostic model.
  2. Curate — group operations by tag (path-segment fallback), merge the smallest related groups to fit a tool budget, split any oversized group into focused sub-tools using leftover headroom, and collapse each group into one meta-tool with an action selector.
  3. Serve — expose either tool set over the MCP stdio transport; tool calls become real HTTP requests against the spec's server URL.
  4. Eval — force the model to pick a tool for each golden request and score raw vs curated routing.

Security

Runs fully local; nothing leaves your machine except LLM calls (eval, with your key) and the API calls your served spec makes. An SSRF guard is on by default — tool calls to loopback/private/link-local hosts are blocked (the cloud-metadata address 169.254.169.254 always), so a malicious spec can't exfiltrate your auth headers. Use --allow-local-network to serve a localhost/private API. See SECURITY.md.

Development

python -m pytest        # 35 tests: parser, curation, server roundtrip, eval

Tests are offline: the parser/curation suites need no network, and the eval suite uses a scripted LLM client (no API key).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_curate-0.1.0.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_curate-0.1.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_curate-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_curate-0.1.0.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mcp_curate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d35d586d5336b62fce5b555e7f92850c156da22c1c35ce8e148eff8824ad2f2b
MD5 14c7c6ed655c881b1b1f477a11d44e78
BLAKE2b-256 4078b7e1c25340f1cdaa7e4df4ef5b87bf8e0ea5dba0f521508969fe293a1933

See more details on using hashes here.

File details

Details for the file mcp_curate-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_curate-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mcp_curate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83f49392be46ea904082f79aa8bd0b6ca9027cf0de4d325041995f7b94060c3f
MD5 fb00d456d21749f603db2beb5d698f55
BLAKE2b-256 ba8c525111285f4d04442b2fb283068464388603a521a2ac31c8b5cd8c213d7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page