Multi-LLM router MCP server for Claude Code — smart complexity routing, Claude subscription monitoring, Codex integration, 20+ providers
Project description
Route every AI call to the cheapest model that can do the job well. 48 tools · 20+ providers · personal routing memory · budget caps, dashboards, traces.
Average savings: 60–80% vs running everything on Claude Opus.
Install
pipx install claude-code-llm-router && llm-router install
| Host | Command |
|---|---|
| Claude Code | llm-router install |
| VS Code | llm-router install --host vscode |
| Cursor | llm-router install --host cursor |
| Codex CLI | llm-router install --host codex |
What It Does
Intercepts prompts and routes them to the cheapest model that can handle the task. Most AI sessions are full of low-value work: file lookups, small edits, quick questions. Those burn through expensive models unnecessarily.
llm-router keeps cheap work on cheap/free models, escalates to premium models only when needed. No micromanagement required.
- Works in: Claude Code, Cursor, VS Code, Codex, Windsurf, Zed, claw-code, Agno
- Free-first: Ollama (local) → Codex → Gemini Flash → OpenAI → Claude (subscription)
Mental Model
Think of llm-router as a smart task dispatcher. When you ask a question:
- Analyze — What kind of task is this? (simple lookup vs. complex reasoning)
- Choose — Which model can handle this best and cheapest?
- Check Constraints — Are we over budget? Is this model degraded?
- Execute — Send to that model
The dispatcher learns over time: if a model starts performing poorly (judge scores drop), it gets demoted in future decisions. If you're running low on quota (budget pressure), it automatically uses cheaper models. You don't manage any of this—it just happens behind the scenes.
Example: "Explain this error message" → Simple task → Route to Haiku (fast, cheap) → Done. vs. "Refactor this complex architecture" → Complex task → Route to Opus (expensive but thorough) → Done.
The savings come from not using Opus for every question.
New in v6.4 — Quality Guard
- Judge-based quality feedback integrated into routing decisions
- Quality reordering — models demoted if scores drop below threshold
- Hard floor enforcement — poor-performing models automatically escalated to better tier
See CHANGELOG.md for all changes.
New in v6.3 — Three-Layer Compression
- RTK command compression — bash output filtered (60–90% reduction)
- Model-based routing — existing cost reduction (70–90%)
- Response compression — LLM outputs condensed (60–75% reduction)
- Unified dashboard —
llm_gainshows all layers
How It Works
User Prompt
↓
[Complexity Classifier] — Haiku/Sonnet/Opus?
↓
[Free-First Router] — Ollama → Codex → Gemini Flash → OpenAI → Claude
↓
[Budget Pressure Check] — Downshift if over 85% budget
↓
[Quality Guard] — Demote if judge score < 0.6
↓
Selected Model → Execute
Configuration
Zero-config by default if you use Claude Code Pro/Max (subscription mode).
Optional env vars:
OPENAI_API_KEY=sk-... # GPT-4o, o3
GEMINI_API_KEY=AIza... # Gemini Flash (free tier)
OLLAMA_BASE_URL=http://localhost:11434 # Local Ollama (free)
LLM_ROUTER_PROFILE=balanced # budget|balanced|premium
LLM_ROUTER_COMPRESS_RESPONSE=true # Enable response compression
For full setup guide, see docs/SETUP.md.
MCP Tools (48 total)
Routing:
llm_route— Route task to optimal modelllm_classify— Classify task complexityllm_quality_guard— Monitor model health
Text:
llm_query,llm_research,llm_generate,llm_analyze,llm_code
Media:
llm_image,llm_video,llm_audio
Admin:
llm_usage,llm_savings,llm_budget,llm_health,llm_providers
Advanced:
llm_orchestrate— Multi-step pipelinesllm_setup— Configure provider keysllm_policy— Routing policy management
Full tool reference — Complete documentation for all 48 tools
Architecture
See CLAUDE.md for:
- Design decisions
- Module organization
- Development workflow
- Release process
See docs/ARCHITECTURE.md for:
- Three-layer compression pipeline
- Judge scoring system
- Quality trend tracking
- Budget pressure algorithm
Development
uv run pytest tests/ -q # Run tests
uv run ruff check src/ tests/ # Lint
uv run llm-router --version # Check version
License
MIT — See LICENSE
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Releases: PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claude_code_llm_router-6.8.0.tar.gz.
File metadata
- Download URL: claude_code_llm_router-6.8.0.tar.gz
- Upload date:
- Size: 612.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aeb153ff4843343245ae86c01f0c245f097ccb924b0b3153cbc1e5ae513ae8d5
|
|
| MD5 |
48fded590a0f751eb3850c951733a8c7
|
|
| BLAKE2b-256 |
01f89d80f10b63ca4c0b9f49e75c4a9f42fb94501b3f7f2509fb69aeb2a184a1
|
File details
Details for the file claude_code_llm_router-6.8.0-py3-none-any.whl.
File metadata
- Download URL: claude_code_llm_router-6.8.0-py3-none-any.whl
- Upload date:
- Size: 453.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89eb3b219b35ebccbaec5d0e67481e53c0aa764c1a6c42c1ddaa2c87f6872b1a
|
|
| MD5 |
f589e915f070e5813c7e0515c3631950
|
|
| BLAKE2b-256 |
b60f218feccc0ced25f2f1021d01ccd8c3e4a8ddd4126e737e640b96cfe491ba
|