Skip to main content

Turn an OpenAPI spec into a high-quality, curated MCP server — with an eval harness that proves curation works.

Project description

mcp-curate

PyPI CI Python License: MIT

Turn an OpenAPI spec into a curated MCP server an LLM can actually use — and prove it with an eval.

A naive OpenAPI→MCP generator dumps one tool per endpoint. Point it at GitHub's API and the model drowns in 1190 tools and picks the wrong one. mcp-curate consolidates those endpoints into a small set of clear, well-described meta-tools — and ships an eval harness that measures whether the model picks the right tool, raw vs curated, on your own spec.

Before / after

Spec Raw tools Curated tools Reduction
Swagger Petstore 19 3 84%
Stripe API 587 40 93%
GitHub REST API 1190 40 97%
$ mcp-curate curate examples/github.json
raw tools:     1190
curated tools: 40  (budget 40)
reduction:     97%

Curated tools (actions consolidated):
  - repos: 202 actions  [repos]
  - actions: 187 actions  [actions]
  - orgs: 108 actions  [orgs]
  - issues: 55 actions  [issues]
  ...

Each curated tool exposes an action argument that selects the underlying operation, so 1190 flat choices become 40 namespaced ones.

Oversized tags get split, not stuffed. When the tool budget has headroom, a giant tag is broken into focused sub-tools by path instead of one bloated tool. With more budget, GitHub's 202-operation repos tag splits cleanly:

$ mcp-curate curate examples/github.json --max-tools 120 --max-actions 30
  - repos: ...            repos_branches, repos_commits, repos_collaborators,
  - repos_branches: 36    repos_comments, repos_compare, ... (focused sub-tools)

At a tight budget (the default 40), curation keeps tags whole and clean rather than forcing unrelated tags together; raise --max-tools to trade tool count for smaller, more focused tools.

Why this saves money, latency, and context

Tool definitions (names + descriptions + schemas) are sent to the model as input tokens on every request. Fewer tools means fewer tokens every call — so it's cheaper, faster, and actually fits in the context window.

Spec Raw tool-defs Curated Reduction
Stripe ~444,900 tokens ~24,300 tokens 95%
GitHub ~318,400 tokens ~49,500 tokens 84%

At Sonnet input pricing ($3 / 1M tokens), the Stripe tool definitions alone cost ~$1.33 per request raw vs ~$0.07 curated — about $1,260 saved per 1,000 requests, before the model even answers. (Prompt caching narrows the gap to ~18× on cache hits; output tokens are unchanged.)

For a large API the bigger win is feasibility, not cost: Stripe's raw 445K tokens of tool definitions exceed most context windows, so the raw server won't load at all — curated, it fits.

Does curation actually help? (the eval)

mcp-curate eval runs natural-language requests against both the raw and the curated tool set using your LLM key, and reports how often the model routes to the correct tool.

$ export ANTHROPIC_API_KEY=...
$ mcp-curate eval examples/petstore.json --cases examples/eval_cases/petstore.yaml

Eval: raw vs curated tool selection
cases: 14   raw tools: 19   curated tools: 3

raw     correct-tool selection:    93%
curated correct-tool selection:   100%
  -> improvement: +7 points
curated tool+action correct:      100%

argument construction (5 cases with expected args):
  raw     correct args:   100%
  curated correct args:   100%

Petstore is deliberately tiny (19 tools), so even the raw server does well — yet curated still reaches 100%, fixing the one case where the raw model returned no tool at all. The gap widens sharply as the API grows: tool-selection accuracy is known to degrade past ~100 tools, and a raw server with hundreds of tools (Stripe's 587, GitHub's 1190) often won't load at all (see the token table above). The harness uses your key on your spec — run it on a bigger spec to see the real spread. Golden sets ship for Petstore and Stripe (examples/eval_cases/); add your own as a small YAML file.

The eval is deliberately honest. Beyond correct-tool selection it also reports:

  • curated tool + action accuracy — so curation can't "win" just by offering fewer, broader tools (it must still route to the right operation);
  • argument construction accuracy (raw vs curated) — for cases that declare expected arguments, whether the model filled the right parameters (e.g. petId: 42 from "look up pet 42").

What costs money, what's free

Almost everything is free and offline — only the eval makes LLM calls.

Command LLM calls? Cost
parse / curate / serve No Free — run as often as you like, no API key
eval Yes (~28 per Petstore run: raw and curated, all cases) A few cents per run (e.g. ~$0.23 on Petstore with Sonnet) — not a one-time fee
curate --llm-descriptions (optional) Yes (one per tool) A few cents — and with --export it's truly one-time: pay once, then serve the prebuilt file free forever

The eval costs money each time you run it because it makes real API calls to measure raw vs curated — including the expensive raw side on purpose. You only run it to get a number, not as part of normal use. The savings happen at runtime, when your AI agent calls the curated server you deployed — every such call uses ~95% fewer tool-definition tokens than the raw equivalent.

Tip: don't run eval on Stripe/GitHub just to see a big number — 14 raw requests of ~445K tokens each would cost $18+ and may exceed the context window. The token-reduction table above already proves the large-API case.

Forking this repo? The status badges above point to tarundattagondi/mcp-curate. Replace that with your-username/mcp-curate in the three badge URLs at the top so they track your own fork's CI.

Quickstart

pip install mcp-curate
mcp-curate demo          # curates a bundled Petstore spec — see 19 -> 3 instantly, no setup

Install

pip install mcp-curate        # core CLI
pip install "mcp-curate[llm]" # + eval harness / --llm-descriptions (Anthropic)

Or from source (for development, or to run the example specs):

git clone https://github.com/tarundattagondi/mcp-curate && cd mcp-curate
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,llm]"
./examples/fetch_specs.sh        # petstore is committed; this also grabs GitHub + Stripe

Usage

# Zero-setup demo on the bundled Petstore spec.
mcp-curate demo

# Inspect a spec's raw tool count.
mcp-curate parse examples/petstore.json

# See the before/after curation report.
mcp-curate curate examples/github.json --max-tools 40

# Serve the curated MCP server over stdio (bring-your-own auth header).
mcp-curate serve examples/petstore.json --curated \
  --header "Authorization: Bearer $TOKEN"

# A/B the tool selection with your LLM key.
mcp-curate eval examples/petstore.json --cases examples/eval_cases/petstore.yaml

Bake the curation once, serve it free forever

serve --curated re-curates on every launch — instant and free for the default deterministic curation. But if you use --llm-descriptions (which calls the LLM), you don't want to pay on every restart. Export the curated tool set once, then serve the prebuilt file with no further curation or API calls:

# Pay the LLM once, write a reusable file:
mcp-curate curate api.json --llm-descriptions --export curated.json

# Serve it forever, free — no re-curation, no LLM:
mcp-curate serve curated.json --header "Authorization: Bearer $TOKEN"

Add --llm-descriptions to curate/serve/eval to let the LLM polish the curated tool names and descriptions (otherwise they're generated deterministically, with no API key required).

How it works

  1. Parse — load OpenAPI 3.x (JSON/YAML), resolve $ref with cycle cutting, flatten each operation into a spec-agnostic model.
  2. Curate — group operations by tag (path-segment fallback), merge the smallest related groups to fit a tool budget, split any oversized group into focused sub-tools using leftover headroom, and collapse each group into one meta-tool with an action selector.
  3. Serve — expose either tool set over the MCP stdio transport; tool calls become real HTTP requests against the spec's server URL.
  4. Eval — force the model to pick a tool for each golden request and score raw vs curated routing.

Security

Runs fully local; nothing leaves your machine except LLM calls (eval, with your key) and the API calls your served spec makes. Defenses on by default:

  • SSRF guard — tool calls to loopback/private/link-local hosts are blocked (cloud-metadata 169.254.169.254 always); --allow-local-network to opt in.
  • Tool-poisoning defense — descriptions are scrubbed of hidden unicode and scanned for prompt-injection ("ignore previous instructions", <system> tags, secret-exfil), warning you which tools look suspicious.
  • No redirects, TLS verified, 64 MB spec cap, no secrets in the package.

Only serve specs you trust. See SECURITY.md.

Development

python -m pytest        # 54 tests: parser, curation, server, eval, demo, export, security

Tests are offline: the parser/curation suites need no network, and the eval suite uses a scripted LLM client (no API key).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_curate-0.1.4.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_curate-0.1.4-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_curate-0.1.4.tar.gz.

File metadata

  • Download URL: mcp_curate-0.1.4.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mcp_curate-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0b8d507984910afe347dbb224a9e781f34e23d592005957fdf7ffda319d107fb
MD5 ea0c3cc7bc7fde1cf329e2edc3b079b3
BLAKE2b-256 31a718d9872e449f4cc52d949473aeeaa94eb7c8c3ed9d6b0759f84a5316ab44

See more details on using hashes here.

File details

Details for the file mcp_curate-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: mcp_curate-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 39.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mcp_curate-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5d1fad37fab914349006d45c787ca8400de33b228d70e8d3cd13a0bd1b9a156a
MD5 25a863468fd9fd845c90e844cdc417fd
BLAKE2b-256 a69b05a00b448091c1f4d3e50edd7a950087aa0d009ab3c7c4e646529bf78411

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page