AI gateway with capability-based routing across providers + bring-your-own-key passthrough. OpenAI- and Anthropic-compatible. Drop-in for Claude Code, Cursor, Aider, Cline, Continue.
Project description
ModelMeld
AI gateway with capability-based routing across providers and per-request bring-your-own-key passthrough. Speaks OpenAI Chat Completions AND Anthropic Messages natively. Drop-in for Claude Code, Cursor, Aider, Cline, Continue.
What problem this solves
You're paying frontier-model prices on every request — including the ones where a coding-tuned 7B model would produce identical output. Most gateways force a global choice ("use Anthropic" / "use OpenAI" / "use local"). ModelMeld picks per request, with three policies you control:
anthropic/modelmeld-saver— OSS-only. Never escalates to frontier. Predictable cost ceiling — you pay OSS-tier rates regardless of request complexity.anthropic/modelmeld-auto— OSS by default; escalates to frontier (Sonnet/Opus) when the user's prompt contains 2+ reasoning markers ("think step by step", "explain your reasoning", etc.). Mirrors LiteLLM's Complexity Router trigger.anthropic/modelmeld-quality— Frontier-first. Downgrades to OSS only on detected trivial work (autocomplete-shape, background calls).
Frontier-tier routing uses BYOK — your Anthropic/OpenAI key is sent
as a per-request header (x-modelmeld-byok-anthropic: sk-ant-…), used
to make the upstream call, then forgotten. Never stored at rest, never
logged. Same pattern as competitor gateways but without their
per-request BYOK markup or the key-custody burden.
Quickstart
pip install modelmeld
modelmeld setup --tool claude-code
The setup CLI prompts for your ModelMeld API key (and optionally your
Anthropic key for BYOK), writes a sourceable shell script, pre-configures
Claude Code's /model picker with the three aliases above, and
smoke-tests the whole routing pipeline before declaring success.
Then in your shell:
source ~/.modelmeld/setup-claude-code.sh
claude
In the Claude Code TUI, type /model → pick ModelMeld — Saver (or
Auto / Quality). That's it.
Self-host
modelmeld setup configures your tool against the hosted gateway at
modelmeld-enterprise.onrender.com. If you'd rather run the gateway
yourself:
pip install 'modelmeld[anthropic,openai]'
export ANTHROPIC_API_KEY=sk-ant-… # your real Anthropic key
export OPENAI_API_KEY=sk-… # your real OpenAI key (optional)
uvicorn modelmeld.api.server:app --host 0.0.0.0 --port 8080
Then point your tool at http://localhost:8080. The gateway's
behavior is identical to the hosted endpoint; you just supply the
upstream keys directly via env vars instead of BYOK headers.
For routing across local vLLM + cloud providers, see
docs/backends.md.
What's in the package
- Two API surfaces, one routing pipeline:
- OpenAI-compatible at
/v1/chat/completions— drop-in for any client that speaks OpenAI's wire format (Cursor, Aider, Continue, Cline, OpenAI SDK, Codex CLI, etc.). Plus/v1/modelslisting. - Anthropic-compatible at
/v1/messages— drop-in for any client that speaks Anthropic's wire format (Claude Code viaANTHROPIC_BASE_URL,anthropic-sdk-python,@anthropic-ai/sdk). - Both surfaces stream via SSE, share the same scout/router/memory/
cache pipeline, and emit identical
x-modelmeld-*response headers.
- OpenAI-compatible at
- Provider adapters — OpenAI, Anthropic (with full schema translation in both directions), vLLM, TensorRT-LLM. Each adapter retries transient errors (429 / 5xx / network blip) with exponential backoff before surfacing to the router.
- Capability-based routing —
CapabilityScoutpicks the cheapest model that meets a quality threshold for the prompt's task category, driven by theModelRegistry. - Active tiered memory — L0 raw log, L1 facts, L2 evolving summary, L3 hot-zone — for cross-model conversation continuity.
- Completion cache — exact-match (in-memory or Redis) + semantic (Qdrant); cache key pools across users routed to the same served model.
- PII scrubbing — runs on every egress path before cloud upload.
- Framework integration headers — declare task category + agent role from AutoGen / CrewAI / LangGraph / OpenClaw to bypass the classifier.
- Production-tuned defaults —
DEFAULT_HEURISTIC_WEIGHTS,DEFAULT_QUALITY_THRESHOLD = 0.70, full dev-tool detection catalog, and a currentdefault_registry.jsonsnapshot ship as the defaults. All tunable via constructor args.
Licensing — code, data, and the live feed
This package is licensed under three different sets of terms.
| Component | License | What you can do |
|---|---|---|
| Python code | AGPL-3.0-or-later (LICENSE) | Use, modify, redistribute. If you offer a modified gateway as a network service to third parties, your modifications must also be AGPL. Calling the gateway over HTTP from unmodified clients (Cursor, Aider, Claude Code, etc.) does NOT make those clients AGPL. |
Bundled snapshot data (scout/data/default_registry.json) |
CC-BY-4.0 (with attribution) | Use the snapshot scores in your own routing decisions |
Live curated registry feed (feed.modelmeld.ai) |
Subscription terms | Only with an active ModelMeld subscription |
Why AGPL and not Apache-2.0? ModelMeld is network-service software in
the spirit of Sentry, Grafana Loki, MinIO, and GitLab CE. AGPL preserves
the OSS adoption flywheel for individual users, dev tools, and self-host
deployments — everyone running the gateway for themselves, or having their
tools call it over HTTP, is fully unaffected by the copyleft clause. AGPL
does prevent a competitor from forking modelmeld, layering on
proprietary tweaks, and offering a competing managed service without
contributing back. Commercial licensing for cases that need to be
exempt from AGPL: contact hello@modelmeld.ai. The full reasoning is in
docs/license-rationale.md.
The curated registry feed is the subscription product — it's updated continuously, editorially weighted across multiple benchmark sources, and represents the actual ongoing work that keeps routing decisions sharp as new models ship.
If you pip install modelmeld and never subscribe, everything works
— you just route on a snapshot of benchmark data taken at OSS release
date. Over ~6 months the foundation-model market shifts enough that
your snapshot stales relative to the current best-cost frontier. See
docs/registry-feed.md and
docs/open-core-boundary.md for the
boundary contract.
Supported backends + integrations
Two complementary surfaces:
- Backends — the inference providers ModelMeld routes to: OpenAI, Anthropic, vLLM (self-hosted open-weights), TensorRT-LLM + Triton, with Google Gemini planned. Includes setup snippets per backend and the explicit "not supported" list.
- Integrations — the frameworks
and dev tools ModelMeld routes for:
- Validated today: Claude Code (via
/v1/messages), Aider, OpenAI SDK, anthropic SDK, AutoGen, CrewAI, LangGraph, OpenClaw. Anything built on OpenAI or Anthropic SDKs works as a drop-in. - Should work, not yet live-tested: Cursor, Continue, Cline,
Codex CLI. All speak OpenAI's
/v1/chat/completionswhich is our native dialect. - Routing-hint headers (
x-modelmeld-task-category,x-modelmeld-agent-role, etc.) let frameworks declare task category and agent role explicitly instead of relying on the classifier.
- Validated today: Claude Code (via
Integration scope — v1 commitments
Wire formats we speak: OpenAI Chat Completions API + Anthropic Messages API. Streaming (SSE) on both, tool-use on both, multi-turn conversations on both. The Anthropic surface translates at the HTTP boundary; internally everything runs through the same routing pipeline.
On the roadmap, not v1:
- OpenAI Responses API (
/v1/responses) — for clients that adopt OpenAI's newer surface. Current Codex CLI still uses/v1/chat/completionsand works today. - Anthropic image content blocks (vision input). Claude Code doesn't use vision; documented as deferred.
Already shipped (was on roadmap, now live):
- Anthropic prompt caching —
cache_controlbreakpoints are forwarded verbatim to the upstream Anthropic call via native-shape passthrough on/v1/messages. Your prompt cache hits work through the gateway. (Many competitor gateways strip this; we don't.)
Status
Pre-1.0. The OSS API surface is stable in spirit but not yet under
SemVer guarantees — see docs/api-stability.md
for which symbols carry compatibility commitments.
Contributing
PRs welcome. See CONTRIBUTING.md for the dev workflow,
code style (ruff format + ruff check + pyright), and DCO commit-signoff requirement.
Good first issues are labeled accordingly.
We do not accept PRs that modify the bundled snapshot data files
(scout/data/) — those are curated centrally for the live feed. File
issues against bad routing decisions you observe and we'll evaluate
adjustments for the next feed release.
Community
- GitHub Issues — bugs + feature requests (after reading
CONTRIBUTING.md) - GitHub Discussions — questions, ideas, integration help
- Security — see
SECURITY.md; report tosecurity@modelmeld.ai(90-day disclosure window)
Enterprise tier
For production deployments needing API-key auth, RBAC, OIDC SSO,
Postgres-backed SOC2-grade audit logs, encryption-at-rest, per-tenant
rate limiting, FinOps dashboards, multi-tenant Qdrant cache, or the
managed hosted tier — contact us at hello@modelmeld.ai.
License
- Code: AGPL-3.0-or-later (see LICENSE, NOTICE,
docs/license-rationale.md) - Data files: CC-BY-4.0 (see scout/data/LICENSE.md)
- Live feed: subscription terms (see NOTICE and docs/registry-feed.md)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelmeld-0.1.2.tar.gz.
File metadata
- Download URL: modelmeld-0.1.2.tar.gz
- Upload date:
- Size: 153.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
338a14099a00e0b222b2e523274f46c78f171f65b6db20379f554c76996f4f3d
|
|
| MD5 |
61b47d027f986af60226295bb29ba9d3
|
|
| BLAKE2b-256 |
b7e1340773c37bd89e927b40f5c4bdf85bdc60f8dcc6689d0095ebb3fd1caee8
|
File details
Details for the file modelmeld-0.1.2-py3-none-any.whl.
File metadata
- Download URL: modelmeld-0.1.2-py3-none-any.whl
- Upload date:
- Size: 198.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cf80f0591066269e06c0d6f8ecb5f3cd0896194d1ac43f480e366874b637ada
|
|
| MD5 |
f1a2bee112a40c69ac32fbf08ac15769
|
|
| BLAKE2b-256 |
53fbb37735311d65023e8c848f555919bac9102fee93b862b30cbbb702a4e8d2
|