Normalize MCP / OpenAI-format tool JSON schemas into the narrower subset llama.cpp's grammar converter accepts. Bridges the standards gap between MCP-mandated JSON Schema 2020-12 and what local grammar-constrained sampling backends actually compile.
Project description
mcp-schema-normalize
Bridge MCP tool schemas to llama.cpp's grammar-compatible subset.
Normalize MCP / OpenAI-format tool JSON schemas into the narrower subset llama.cpp's grammar converter accepts. Bridges the standards gap between MCP-mandated JSON Schema 2020-12 (SEP-1613) and what local grammar-constrained sampling backends actually compile.
If your MCP tool calls work fine against Anthropic / OpenAI hosted APIs but die with Unable to generate parser for this template or Error resolving ref … anyOf not in {…} when routed through llama.cpp (llama-server, llama-swap, Ollama, etc.) — this library is for you.
What it fixes
These are documented permanent limitations of llama.cpp's json-schema-to-grammar.cpp, authoritatively listed in the grammars README maintained by the converter's implementer. The cited issues are closed — not because they were fixed, but because they were accepted as won't-fix or fell out of triage. This library is the gateway-side workaround for that documented gap.
| Failure mode | Upstream status | What this library does |
|---|---|---|
anyOf (or oneOf) beside properties / type / required / additionalProperties |
Documented limitation (#7703 — closed, covered by grammars/README.md) | Distribute siblings into each union branch, producing self-contained objects |
{"not": {}} sentinel from zod-to-json-schema |
Closed with a LibreChat-side patch as the resolution (#17574) | Drop empty-not keywords; preserve non-empty not schemas |
Nested $refs into anyOf nodes |
Documented limitation (#8073 — closed, still active in current builds) | Inline non-cyclic refs; preserve cyclic refs (llama.cpp handles cycles natively) |
Schemas that expand past MAX_REPETITION_THRESHOLD = 2000 |
Closed without fix (#21228, user-side workaround posted) | Coarsen inlines that would blow the budget |
| llama-server silently falls back to unconstrained generation when grammar build fails | Closed as stale by bot (#19051 — still observable) | Pre-flight size budget + telemetry to make the silent fallback visible |
Dangling $ref (paths that don't exist) — common zod-to-json-schema artifact when singleton unions collapse |
Upstream schema-generator bug | Replace with permissive {} so the request still completes. See the load-bearing caveat below. |
Install
This package is pure Python, zero runtime dependencies for the core. The LiteLLM proxy hook lives behind an optional extra so consumers who only need the schema transforms don't pull in LiteLLM.
# Pure-core: just the schema transforms (normalize_schema, normalize_tools,
# resolve_pointer, build_ref_graph, find_ref_cycles). No third-party deps.
pip install mcp-schema-normalize
# Add the LiteLLM CustomLogger pre-call hook. Pulls litellm>=1.0.
pip install mcp-schema-normalize[litellm]
# Development (pytest, ruff).
pip install mcp-schema-normalize[dev]
Equivalent uv invocations:
uv add mcp-schema-normalize # pure core
uv add 'mcp-schema-normalize[litellm]' # + LiteLLM hook
Import the public API from the top-level package; integrations live under their own submodule path:
# Pure-core API — always available
from mcp_schema_normalize import normalize_schema, normalize_tools
# LiteLLM hook — only available with [litellm] extra installed
from mcp_schema_normalize.integrations.litellm import normalize_tool_schemas_handler
Quick start
Direct use (any framework, any backend)
from mcp_schema_normalize import normalize_tools
# Your OpenAI-format tool list as received from an MCP server
tools = [
{
"type": "function",
"function": {
"name": "paperclipUpdateIssue",
"parameters": {
# ... a JSON Schema 2020-12 tool definition with $ref, anyOf,
# not:{} sentinels, etc. — whatever zod-to-json-schema emits
},
},
},
]
normalized, telemetry = normalize_tools(tools)
# `normalized` is safe to forward to llama.cpp
# `telemetry` is a dict of counters you should log / alert on
LiteLLM proxy
Two steps: install the package into the proxy's Python environment, then register the hook in config.yaml.
Build a custom image that includes the package:
FROM ghcr.io/berriai/litellm:main-latest
RUN pip install --no-cache-dir 'mcp-schema-normalize[litellm]'
Register the hook in your config.yaml:
litellm_settings:
callbacks:
- "mcp_schema_normalize.integrations.litellm.normalize_tool_schemas_handler"
# ... any other callbacks (after this one)
The hook will rewrite every tool's function.parameters in-flight on chat-completion, responses, and other tool-carrying calls. One INFO-level summary log per modified request, escalated to WARN if anything lossy fires. All telemetry counters land as structured extra= fields for log aggregators (Loki, Datadog, etc.) to index.
See docs/litellm.md for:
- Running on a read-only / hardened LiteLLM container (volume-mount pattern)
- Callback ordering against
strip_invalid_tools, OTel, and other common callbacks - Troubleshooting (logs not appearing, hook not firing, etc.)
⚠️ When NOT to use this — load-bearing assumption
This library will make your request go through even when your MCP server emits broken schemas. The cost is that affected fields lose their type spec and the model may emit structurally wrong values (e.g. a number where the schema said string-or-null).
The most common case: zod-to-json-schema's singleton-union-collapse bug, where z.union([X, ...]) collapses to its sole concrete variant but the generated $ref strings still expect the pre-collapse anyOf envelope. The library detects these dangling refs and replaces them with {} (match-anything) so the request completes; the original schema is malformed and gets silently loosened.
Telemetry surfaces every event but you must be watching for it. The library emits:
refs_unresolvedcounter — incremented per dangling ref- WARN-level per-ref log line —
unresolvable $ref replaced with permissive {} fallback - WARN-level per-request summary log — escalated whenever any lossy counter is non-zero
- Per-schema WARN-line rate limiting (default 10 per schema) so a runaway broken server can't flood logs; aggregate counter still reflects every event
If your observability stack doesn't alert on either the counter or the WARN log, you will not notice schemas are degrading silently. In that case set STRICT_UNRESOLVED_REFS = True to opt out of the fallback — dangling refs are then left in place, llama.cpp's grammar converter rejects the tool, and the failure surfaces as a 400 instead of a degraded response.
import mcp_schema_normalize
mcp_schema_normalize.STRICT_UNRESOLVED_REFS = True # fail loudly
Other lossy events the library also surfaces:
empty_union_drops—anyOf: [{"not": {}}]collapsed; siblings retained (strict loosening)union_coexistence_skipped—anyOfandoneOfat the same level; we refuse to rewrite (correct handling needs allOf-wrapping; not yet implemented)size_coarsenings— inline would blowSIZE_BUDGET = 1500; deepest inline coarsened to{"type": "object"}max_inline_depth_reached—$refchain exceededMAX_INLINE_DEPTH = 5; tail coarsened to{"type": "object"}
Telemetry reference
normalize_schema() and normalize_tools() return (new_schema, telemetry) and (new_tools, telemetry) respectively. The telemetry dict's keys, what they mean, and when to alert:
| Counter | Meaning | Lossy? | Routine on… |
|---|---|---|---|
refs_inlined |
Number of $refs successfully inlined |
no | Schemas with shared types |
cycles_preserved |
Cyclic $refs left in place for llama.cpp to handle |
no | Recursive types (TreeNode-style) |
refs_unresolved |
Dangling $refs replaced with {} |
yes | Broken MCP servers |
size_coarsenings |
Inlines coarsened due to size budget | yes | Pathologically large schemas |
max_inline_depth_reached |
Inline chains hit the depth cap | yes | Deeply nested ref graphs |
anyof_rewrites |
anyOf-beside-siblings distributions performed |
no | Well-typed MCP schemas |
oneof_rewrites |
oneOf-beside-siblings distributions performed |
no | Same |
not_drops |
{"not": {}} sentinels removed |
no | zod-emitted schemas |
empty_union_drops |
Unions that became empty after not:{} filtering |
yes | zod bugs |
union_coexistence_skipped |
Skipped node had both anyOf and oneOf |
yes | Unusual schemas |
A reasonable Grafana alert: sum(rate(refs_unresolved[5m])) by model > 0 pages whenever any tool schema starts emitting dangling refs.
Configuration
All knobs are module-level constants you can monkey-patch before use:
import mcp_schema_normalize
mcp_schema_normalize.SIZE_BUDGET = 1500 # llama.cpp threshold proxy
mcp_schema_normalize.MAX_INLINE_DEPTH = 5 # ref-chain depth cap
mcp_schema_normalize.MAX_PER_SCHEMA_REF_WARNINGS = 10 # per-schema log rate limit
mcp_schema_normalize.STRICT_UNRESOLVED_REFS = False # True = no permissive fallback
Backends and frameworks
The library is structurally agnostic — it operates on JSON Schema. It's been tested with:
- LiteLLM proxy → llama-swap → llama.cpp server (primary use case; first-class integration shipped)
- Direct llama-server via OpenAI-compatible API (use the pure-core
normalize_tools()in your own client) - Ollama (same llama.cpp grammar converter underneath; pure-core API applies)
Adding integrations for vLLM, TabbyAPI, or other proxies is a matter of writing a thin adapter that calls normalize_tools(). PRs welcome.
Status
0.1.0, alpha. API may change before 1.0. The pipeline and telemetry surface are stable in intent; specific field names and module constants may move based on user feedback.
Originating incident
This library was extracted from a real production incident — a paperclip MCP server emitting schemas that crashed Qwen3-Coder and Nemotron-Nano local backends with Unable to generate parser for this template. The investigation post-mortem (including "what we should have done differently") is in the LiteLLM repo it was extracted from; if you want the long-form story, ping me and I'll publish it as a blog post.
Contributing
See CONTRIBUTING.md. Bug reports especially welcome — the more broken MCP schemas we see in the wild, the better this library gets at handling them.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_schema_normalize-0.1.1.tar.gz.
File metadata
- Download URL: mcp_schema_normalize-0.1.1.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b869ca959540fd871f49033ce99592cdfdb5932af28000c4550c2bc87045de15
|
|
| MD5 |
c847dc3e406ac2a41a375bd7da7b3b3f
|
|
| BLAKE2b-256 |
1dce7420f2ccd6fdb053fcc17ddf9be323a65f653e076fe64020c431a82520c4
|
Provenance
The following attestation bundles were made for mcp_schema_normalize-0.1.1.tar.gz:
Publisher:
publish.yml on rsclafani/mcp-schema-normalize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_schema_normalize-0.1.1.tar.gz -
Subject digest:
b869ca959540fd871f49033ce99592cdfdb5932af28000c4550c2bc87045de15 - Sigstore transparency entry: 1660305625
- Sigstore integration time:
-
Permalink:
rsclafani/mcp-schema-normalize@07b562251ec27141fe4645dec2e034280c3d7e33 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rsclafani
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@07b562251ec27141fe4645dec2e034280c3d7e33 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mcp_schema_normalize-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mcp_schema_normalize-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ae38a62444e601984bbfdf97fc4cbd99983974ebfc1a7f57def79587ae71d42
|
|
| MD5 |
7f7e09a41facacace58059c5ffaa1528
|
|
| BLAKE2b-256 |
3cbd313669b6080f3b4ebea734ea67a89b4ad39c345327b8968f192d69f141ff
|
Provenance
The following attestation bundles were made for mcp_schema_normalize-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on rsclafani/mcp-schema-normalize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_schema_normalize-0.1.1-py3-none-any.whl -
Subject digest:
6ae38a62444e601984bbfdf97fc4cbd99983974ebfc1a7f57def79587ae71d42 - Sigstore transparency entry: 1660305705
- Sigstore integration time:
-
Permalink:
rsclafani/mcp-schema-normalize@07b562251ec27141fe4645dec2e034280c3d7e33 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rsclafani
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@07b562251ec27141fe4645dec2e034280c3d7e33 -
Trigger Event:
push
-
Statement type: