Skip to main content

Normalize MCP / OpenAI-format tool JSON schemas into the narrower subset llama.cpp's grammar converter accepts. Bridges the standards gap between MCP-mandated JSON Schema 2020-12 and what local grammar-constrained sampling backends actually compile.

Project description

mcp-schema-normalize

Bridge MCP tool schemas to llama.cpp's grammar-compatible subset.

CI PyPI version Python versions License: MIT Ruff

Normalize MCP / OpenAI-format tool JSON schemas into the narrower subset llama.cpp's grammar converter accepts. Bridges the standards gap between MCP-mandated JSON Schema 2020-12 (SEP-1613) and what local grammar-constrained sampling backends actually compile.

If your MCP tool calls work fine against Anthropic / OpenAI hosted APIs but die with Unable to generate parser for this template or Error resolving ref … anyOf not in {…} when routed through llama.cpp (llama-server, llama-swap, Ollama, etc.) — this library is for you.


What it fixes

These are documented permanent limitations of llama.cpp's json-schema-to-grammar.cpp, authoritatively listed in the grammars README maintained by the converter's implementer. The cited issues are closed — not because they were fixed, but because they were accepted as won't-fix or fell out of triage. This library is the gateway-side workaround for that documented gap.

Failure mode Upstream status What this library does
anyOf (or oneOf) beside properties / type / required / additionalProperties Documented limitation (#7703 — closed, covered by grammars/README.md) Distribute siblings into each union branch, producing self-contained objects
{"not": {}} sentinel from zod-to-json-schema Closed with a LibreChat-side patch as the resolution (#17574) Drop empty-not keywords; preserve non-empty not schemas
Nested $refs into anyOf nodes Documented limitation (#8073 — closed, still active in current builds) Inline non-cyclic refs; preserve cyclic refs (llama.cpp handles cycles natively)
Schemas that expand past MAX_REPETITION_THRESHOLD = 2000 Closed without fix (#21228, user-side workaround posted) Coarsen inlines that would blow the budget
llama-server silently falls back to unconstrained generation when grammar build fails Closed as stale by bot (#19051 — still observable) Pre-flight size budget + telemetry to make the silent fallback visible
Dangling $ref (paths that don't exist) — common zod-to-json-schema artifact when singleton unions collapse Upstream schema-generator bug Replace with permissive {} so the request still completes. See the load-bearing caveat below.

Install

This package is pure Python, zero runtime dependencies for the core. The LiteLLM proxy hook lives behind an optional extra so consumers who only need the schema transforms don't pull in LiteLLM.

# Pure-core: just the schema transforms (normalize_schema, normalize_tools,
# resolve_pointer, build_ref_graph, find_ref_cycles). No third-party deps.
pip install mcp-schema-normalize

# Add the LiteLLM CustomLogger pre-call hook. Pulls litellm>=1.0.
pip install mcp-schema-normalize[litellm]

# Development (pytest, ruff).
pip install mcp-schema-normalize[dev]

Equivalent uv invocations:

uv add mcp-schema-normalize                     # pure core
uv add 'mcp-schema-normalize[litellm]'          # + LiteLLM hook

Import the public API from the top-level package; integrations live under their own submodule path:

# Pure-core API — always available
from mcp_schema_normalize import normalize_schema, normalize_tools

# LiteLLM hook — only available with [litellm] extra installed
from mcp_schema_normalize.integrations.litellm import normalize_tool_schemas_handler

Quick start

Direct use (any framework, any backend)

from mcp_schema_normalize import normalize_tools

# Your OpenAI-format tool list as received from an MCP server
tools = [
    {
        "type": "function",
        "function": {
            "name": "paperclipUpdateIssue",
            "parameters": {
                # ... a JSON Schema 2020-12 tool definition with $ref, anyOf,
                # not:{} sentinels, etc. — whatever zod-to-json-schema emits
            },
        },
    },
]

normalized, telemetry = normalize_tools(tools)
# `normalized` is safe to forward to llama.cpp
# `telemetry` is a dict of counters you should log / alert on

LiteLLM proxy

Two steps: install the package into the proxy's Python environment, then register the hook in config.yaml.

Build a custom image that includes the package:

FROM ghcr.io/berriai/litellm:main-latest
RUN pip install --no-cache-dir 'mcp-schema-normalize[litellm]'

Register the hook in your config.yaml:

litellm_settings:
  callbacks:
    - "mcp_schema_normalize.integrations.litellm.normalize_tool_schemas_handler"
    # ... any other callbacks (after this one)

The hook will rewrite every tool's function.parameters in-flight on chat-completion, responses, and other tool-carrying calls. One INFO-level summary log per modified request, escalated to WARN if anything lossy fires. All telemetry counters land as structured extra= fields for log aggregators (Loki, Datadog, etc.) to index.

See docs/litellm.md for:

  • Running on a read-only / hardened LiteLLM container (volume-mount pattern)
  • Callback ordering against strip_invalid_tools, OTel, and other common callbacks
  • Troubleshooting (logs not appearing, hook not firing, etc.)

⚠️ When NOT to use this — load-bearing assumption

This library will make your request go through even when your MCP server emits broken schemas. The cost is that affected fields lose their type spec and the model may emit structurally wrong values (e.g. a number where the schema said string-or-null).

The most common case: zod-to-json-schema's singleton-union-collapse bug, where z.union([X, ...]) collapses to its sole concrete variant but the generated $ref strings still expect the pre-collapse anyOf envelope. The library detects these dangling refs and replaces them with {} (match-anything) so the request completes; the original schema is malformed and gets silently loosened.

Telemetry surfaces every event but you must be watching for it. The library emits:

  • refs_unresolved counter — incremented per dangling ref
  • WARN-level per-ref log line — unresolvable $ref replaced with permissive {} fallback
  • WARN-level per-request summary log — escalated whenever any lossy counter is non-zero
  • Per-schema WARN-line rate limiting (default 10 per schema) so a runaway broken server can't flood logs; aggregate counter still reflects every event

If your observability stack doesn't alert on either the counter or the WARN log, you will not notice schemas are degrading silently. In that case set STRICT_UNRESOLVED_REFS = True to opt out of the fallback — dangling refs are then left in place, llama.cpp's grammar converter rejects the tool, and the failure surfaces as a 400 instead of a degraded response.

import mcp_schema_normalize
mcp_schema_normalize.STRICT_UNRESOLVED_REFS = True  # fail loudly

Other lossy events the library also surfaces:

  • empty_union_dropsanyOf: [{"not": {}}] collapsed; siblings retained (strict loosening)
  • union_coexistence_skippedanyOf and oneOf at the same level; we refuse to rewrite (correct handling needs allOf-wrapping; not yet implemented)
  • size_coarsenings — inline would blow SIZE_BUDGET = 1500; deepest inline coarsened to {"type": "object"}
  • max_inline_depth_reached$ref chain exceeded MAX_INLINE_DEPTH = 5; tail coarsened to {"type": "object"}

Telemetry reference

normalize_schema() and normalize_tools() return (new_schema, telemetry) and (new_tools, telemetry) respectively. The telemetry dict's keys, what they mean, and when to alert:

Counter Meaning Lossy? Routine on…
refs_inlined Number of $refs successfully inlined no Schemas with shared types
cycles_preserved Cyclic $refs left in place for llama.cpp to handle no Recursive types (TreeNode-style)
refs_unresolved Dangling $refs replaced with {} yes Broken MCP servers
size_coarsenings Inlines coarsened due to size budget yes Pathologically large schemas
max_inline_depth_reached Inline chains hit the depth cap yes Deeply nested ref graphs
anyof_rewrites anyOf-beside-siblings distributions performed no Well-typed MCP schemas
oneof_rewrites oneOf-beside-siblings distributions performed no Same
not_drops {"not": {}} sentinels removed no zod-emitted schemas
empty_union_drops Unions that became empty after not:{} filtering yes zod bugs
union_coexistence_skipped Skipped node had both anyOf and oneOf yes Unusual schemas

A reasonable Grafana alert: sum(rate(refs_unresolved[5m])) by model > 0 pages whenever any tool schema starts emitting dangling refs.


Configuration

All knobs are module-level constants you can monkey-patch before use:

import mcp_schema_normalize

mcp_schema_normalize.SIZE_BUDGET = 1500              # llama.cpp threshold proxy
mcp_schema_normalize.MAX_INLINE_DEPTH = 5            # ref-chain depth cap
mcp_schema_normalize.MAX_PER_SCHEMA_REF_WARNINGS = 10  # per-schema log rate limit
mcp_schema_normalize.STRICT_UNRESOLVED_REFS = False  # True = no permissive fallback

Backends and frameworks

The library is structurally agnostic — it operates on JSON Schema. It's been tested with:

  • LiteLLM proxy → llama-swap → llama.cpp server (primary use case; first-class integration shipped)
  • Direct llama-server via OpenAI-compatible API (use the pure-core normalize_tools() in your own client)
  • Ollama (same llama.cpp grammar converter underneath; pure-core API applies)

Adding integrations for vLLM, TabbyAPI, or other proxies is a matter of writing a thin adapter that calls normalize_tools(). PRs welcome.


Status

0.1.0, alpha. API may change before 1.0. The pipeline and telemetry surface are stable in intent; specific field names and module constants may move based on user feedback.

Originating incident

This library was extracted from a real production incident — a paperclip MCP server emitting schemas that crashed Qwen3-Coder and Nemotron-Nano local backends with Unable to generate parser for this template. The investigation post-mortem (including "what we should have done differently") is in the LiteLLM repo it was extracted from; if you want the long-form story, ping me and I'll publish it as a blog post.

Contributing

See CONTRIBUTING.md. Bug reports especially welcome — the more broken MCP schemas we see in the wild, the better this library gets at handling them.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_schema_normalize-0.1.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_schema_normalize-0.1.1-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file mcp_schema_normalize-0.1.1.tar.gz.

File metadata

  • Download URL: mcp_schema_normalize-0.1.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mcp_schema_normalize-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b869ca959540fd871f49033ce99592cdfdb5932af28000c4550c2bc87045de15
MD5 c847dc3e406ac2a41a375bd7da7b3b3f
BLAKE2b-256 1dce7420f2ccd6fdb053fcc17ddf9be323a65f653e076fe64020c431a82520c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_schema_normalize-0.1.1.tar.gz:

Publisher: publish.yml on rsclafani/mcp-schema-normalize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_schema_normalize-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_schema_normalize-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6ae38a62444e601984bbfdf97fc4cbd99983974ebfc1a7f57def79587ae71d42
MD5 7f7e09a41facacace58059c5ffaa1528
BLAKE2b-256 3cbd313669b6080f3b4ebea734ea67a89b4ad39c345327b8968f192d69f141ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_schema_normalize-0.1.1-py3-none-any.whl:

Publisher: publish.yml on rsclafani/mcp-schema-normalize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page