Skip to main content

Multi-LLM router MCP server for Claude Code — smart complexity routing, Claude subscription monitoring, Codex integration, 20+ providers

Project description

LLM Router

Route every AI call to the cheapest model that can do the job well. 48 tools · 20+ providers · personal routing memory · budget caps, dashboards, traces.

PyPI Tests Downloads Python MCP License Stars

Average savings: 60–80% vs running everything on Claude Opus.

Install

pipx install claude-code-llm-router && llm-router install
Host Command
Claude Code llm-router install
VS Code llm-router install --host vscode
Cursor llm-router install --host cursor
Codex CLI llm-router install --host codex
Gemini CLI llm-router install --host gemini-cli

Supported Development Tools

llm-router works as an MCP server inside any tool that supports MCP, providing unified routing across your entire development environment.

Tool Status What You Get
Claude Code ✅ Full Auto-routing hooks + session tracking + quota display
Gemini CLI ✅ Full Auto-routing hooks + session tracking + quota display
Codex CLI ✅ Full Auto-routing hooks + savings tracking
VS Code + Copilot ✅ MCP llm-router tools available (routing is model-voluntary)
Cursor ✅ MCP llm-router tools available (routing is model-voluntary)
OpenCode ✅ MCP llm-router tools available (routing is model-voluntary)
Windsurf ✅ MCP llm-router tools available (routing is model-voluntary)
Any MCP-compatible tool ⚡ Manual Add llm-router to your tool's MCP config

Full Support vs MCP Support

Full support = auto-routing hooks fire before the model answers, enforcing your routing policy. MCP support = tools are available, but the model chooses whether to use them.

Quick Setup by Tool

Claude Code

pipx install claude-code-llm-router
llm-router install

Then in Claude Code, llm_route and friends appear as built-in tools. Your settings control the profile (budget/balanced/premium).

Gemini CLI

pipx install claude-code-llm-router
llm-router install --host gemini-cli

Gemini CLI users get full routing experience: auto-routing suggestions, quota display, and free-first chaining (Ollama → Codex → Gemini CLI → paid).

Codex CLI

pipx install claude-code-llm-router
llm-router install --host codex

Codex integrates deep into the routing chain as a free fallback when your OpenAI subscription is available.

VS Code / Cursor / Others

pipx install claude-code-llm-router
llm-router install --host vscode  # or --host cursor

The MCP server loads automatically. Tools appear in your IDE's model UI.

What It Does

Intercepts prompts and routes them to the cheapest model that can handle the task. Most AI sessions are full of low-value work: file lookups, small edits, quick questions. Those burn through expensive models unnecessarily.

llm-router keeps cheap work on cheap/free models, escalates to premium models only when needed. No micromanagement required.

  • Works in: Claude Code, Cursor, VS Code, Codex, Windsurf, Zed, claw-code, Agno
  • Free-first: Ollama (local) → Codex → Gemini Flash → OpenAI → Claude (subscription)

Mental Model

Think of llm-router as a smart task dispatcher. When you ask a question:

  1. Analyze — What kind of task is this? (simple lookup vs. complex reasoning)
  2. Choose — Which model can handle this best and cheapest?
  3. Check Constraints — Are we over budget? Is this model degraded?
  4. Execute — Send to that model

The dispatcher learns over time: if a model starts performing poorly (judge scores drop), it gets demoted in future decisions. If you're running low on quota (budget pressure), it automatically uses cheaper models. You don't manage any of this—it just happens behind the scenes.

Example: "Explain this error message" → Simple task → Route to Haiku (fast, cheap) → Done. vs. "Refactor this complex architecture" → Complex task → Route to Opus (expensive but thorough) → Done.

The savings come from not using Opus for every question.

New in v6.10 — Automatic Model Evaluator & Kimi-K2.6 Integration

  • Model evaluator system — Benchmarks all available models (Ollama, Codex, APIs) weekly
    • Tests on reasoning + code tasks, scores by quality + latency
    • 7-day cache TTL, auto-runs during session-end hook
    • Manual evaluation via llm_model_eval MCP tool
  • Kimi-K2.6:cloud integrated as primary code specialist
    • 256K context window (2x qwen3.5), specialized for autonomous execution
    • Auto-selected for code-heavy tasks (refactor, debug, implement, test)
    • Fallback chain: qwen3-coder-next → qwen3.5
  • Profile-aware dynamic routing
    • Auto-detect available services (Ollama, API keys, subscriptions)
    • Token-wise tier organization (free local → free subscriptions → cheap APIs → expensive)
    • Quota pressure awareness with real-time deprioritization (≥85% usage)
    • Periodic service scanning (1-hour TTL)

This enables per-user routing respecting each user's unique setup + automatic performance optimization via weekly model benchmarking.

New in v6.9 — Gemini CLI Integration

  • Gemini CLI as free routing provider — 1,500 requests/day via Google One AI Pro
  • Smart insertion into free-first chain — Ollama → Codex → Gemini CLI → paid APIs
  • Context-aware routing — Prioritizes Gemini CLI on high budget pressure, code tasks
  • New llm_gemini MCP tool for direct Gemini CLI invocation
  • Session tracking & quota display — Daily usage meter, savings summary at session-end
  • Auto-route hook for Gemini CLI with complexity classification

See CHANGELOG.md for full version history.

How It Works

User Prompt
    ↓
[Complexity Classifier] — Haiku/Sonnet/Opus?
    ↓
[Free-First Router] — Ollama → Codex → Gemini Flash → OpenAI → Claude
    ↓
[Budget Pressure Check] — Downshift if over 85% budget
    ↓
[Quality Guard] — Demote if judge score < 0.6
    ↓
Selected Model → Execute

Configuration

Zero-config by default if you use Claude Code Pro/Max (subscription mode).

Optional env vars:

OPENAI_API_KEY=sk-...                   # GPT-4o, o3
GEMINI_API_KEY=AIza...                  # Gemini Flash (free tier)
OLLAMA_BASE_URL=http://localhost:11434  # Local Ollama (free)
LLM_ROUTER_PROFILE=balanced             # budget|balanced|premium
LLM_ROUTER_COMPRESS_RESPONSE=true       # Enable response compression

For full setup guide, see docs/SETUP.md.

MCP Tools (48 total)

Routing:

  • llm_route — Route task to optimal model
  • llm_classify — Classify task complexity
  • llm_quality_guard — Monitor model health

Text:

  • llm_query, llm_research, llm_generate, llm_analyze, llm_code

Media:

  • llm_image, llm_video, llm_audio

Admin:

  • llm_usage, llm_savings, llm_budget, llm_health, llm_providers

Advanced:

  • llm_orchestrate — Multi-step pipelines
  • llm_setup — Configure provider keys
  • llm_policy — Routing policy management

Full tool reference — Complete documentation for all 48 tools

Architecture

See CLAUDE.md for:

  • Design decisions
  • Module organization
  • Development workflow
  • Release process

See docs/ARCHITECTURE.md for:

  • Three-layer compression pipeline
  • Judge scoring system
  • Quality trend tracking
  • Budget pressure algorithm

Development

uv run pytest tests/ -q          # Run tests
uv run ruff check src/ tests/    # Lint
uv run llm-router --version      # Check version

License

MIT — See LICENSE

Support

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_code_llm_router-7.0.0.tar.gz (645.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_code_llm_router-7.0.0-py3-none-any.whl (488.8 kB view details)

Uploaded Python 3

File details

Details for the file claude_code_llm_router-7.0.0.tar.gz.

File metadata

File hashes

Hashes for claude_code_llm_router-7.0.0.tar.gz
Algorithm Hash digest
SHA256 b891cef17de8e14b0aadc1ccefc20f5221f27723cf4f46f0e66c2d945f07a8f4
MD5 f152ca46cd0fee6870a571a66852d04d
BLAKE2b-256 7ea3aa5a2e79d2bb564173c33d62b1b0ef8ad879a6d529e04a3ad6e6c137f462

See more details on using hashes here.

File details

Details for the file claude_code_llm_router-7.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_code_llm_router-7.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b42c4a9930d044c7636962f51df4fe10710a90c4b1a34e40453a8d83f5c233c8
MD5 6efc88accace29a58e9def6093956163
BLAKE2b-256 8769e2af90ac9f53d68e444bfe7e9d156d6a9b6ce12cacf79b2a562787811f9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page