Skip to main content

Multi-LLM router MCP server for Claude Code — smart complexity routing, Claude subscription monitoring, Codex integration, 20+ providers

Project description

LLM Router

Route every AI call to the cheapest model that can do the job well. 48 tools · 20+ providers · personal routing memory · budget caps, dashboards, traces.

PyPI Tests Downloads Python MCP License Stars

Average savings: 60–80% vs running everything on Claude Opus.

Install

pipx install claude-code-llm-router && llm-router install
Host Command
Claude Code llm-router install
VS Code llm-router install --host vscode
Cursor llm-router install --host cursor
Codex CLI llm-router install --host codex

What It Does

Intercepts prompts and routes them to the cheapest model that can handle the task. Most AI sessions are full of low-value work: file lookups, small edits, quick questions. Those burn through expensive models unnecessarily.

llm-router keeps cheap work on cheap/free models, escalates to premium models only when needed. No micromanagement required.

  • Works in: Claude Code, Cursor, VS Code, Codex, Windsurf, Zed, claw-code, Agno
  • Free-first: Ollama (local) → Codex → Gemini Flash → OpenAI → Claude (subscription)

Mental Model

Think of llm-router as a smart task dispatcher. When you ask a question:

  1. Analyze — What kind of task is this? (simple lookup vs. complex reasoning)
  2. Choose — Which model can handle this best and cheapest?
  3. Check Constraints — Are we over budget? Is this model degraded?
  4. Execute — Send to that model

The dispatcher learns over time: if a model starts performing poorly (judge scores drop), it gets demoted in future decisions. If you're running low on quota (budget pressure), it automatically uses cheaper models. You don't manage any of this—it just happens behind the scenes.

Example: "Explain this error message" → Simple task → Route to Haiku (fast, cheap) → Done. vs. "Refactor this complex architecture" → Complex task → Route to Opus (expensive but thorough) → Done.

The savings come from not using Opus for every question.

New in v6.4 — Quality Guard

  • Judge-based quality feedback integrated into routing decisions
  • Quality reordering — models demoted if scores drop below threshold
  • Hard floor enforcement — poor-performing models automatically escalated to better tier

See CHANGELOG.md for all changes.

New in v6.3 — Three-Layer Compression

  • RTK command compression — bash output filtered (60–90% reduction)
  • Model-based routing — existing cost reduction (70–90%)
  • Response compression — LLM outputs condensed (60–75% reduction)
  • Unified dashboardllm_gain shows all layers

How It Works

User Prompt
    ↓
[Complexity Classifier] — Haiku/Sonnet/Opus?
    ↓
[Free-First Router] — Ollama → Codex → Gemini Flash → OpenAI → Claude
    ↓
[Budget Pressure Check] — Downshift if over 85% budget
    ↓
[Quality Guard] — Demote if judge score < 0.6
    ↓
Selected Model → Execute

Configuration

Zero-config by default if you use Claude Code Pro/Max (subscription mode).

Optional env vars:

OPENAI_API_KEY=sk-...                   # GPT-4o, o3
GEMINI_API_KEY=AIza...                  # Gemini Flash (free tier)
OLLAMA_BASE_URL=http://localhost:11434  # Local Ollama (free)
LLM_ROUTER_PROFILE=balanced             # budget|balanced|premium
LLM_ROUTER_COMPRESS_RESPONSE=true       # Enable response compression

For full setup guide, see docs/SETUP.md.

MCP Tools (48 total)

Routing:

  • llm_route — Route task to optimal model
  • llm_classify — Classify task complexity
  • llm_quality_guard — Monitor model health

Text:

  • llm_query, llm_research, llm_generate, llm_analyze, llm_code

Media:

  • llm_image, llm_video, llm_audio

Admin:

  • llm_usage, llm_savings, llm_budget, llm_health, llm_providers

Advanced:

  • llm_orchestrate — Multi-step pipelines
  • llm_setup — Configure provider keys
  • llm_policy — Routing policy management

Full tool reference — Complete documentation for all 48 tools

Architecture

See CLAUDE.md for:

  • Design decisions
  • Module organization
  • Development workflow
  • Release process

See docs/ARCHITECTURE.md for:

  • Three-layer compression pipeline
  • Judge scoring system
  • Quality trend tracking
  • Budget pressure algorithm

Development

uv run pytest tests/ -q          # Run tests
uv run ruff check src/ tests/    # Lint
uv run llm-router --version      # Check version

License

MIT — See LICENSE

Support

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_code_llm_router-6.6.0.tar.gz (594.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_code_llm_router-6.6.0-py3-none-any.whl (439.6 kB view details)

Uploaded Python 3

File details

Details for the file claude_code_llm_router-6.6.0.tar.gz.

File metadata

  • Download URL: claude_code_llm_router-6.6.0.tar.gz
  • Upload date:
  • Size: 594.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claude_code_llm_router-6.6.0.tar.gz
Algorithm Hash digest
SHA256 06a643557def636356c1f6294cb8b892cf48e6f9865c63bfd4a6fa325879ec05
MD5 1f059a44380fd4fbb8492466c2e7fff1
BLAKE2b-256 f9578d11da7a60442967ed603733334f52874552a97c14cb838ea910bb3987c6

See more details on using hashes here.

File details

Details for the file claude_code_llm_router-6.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_code_llm_router-6.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e19c448ac6ff530532bb73d1959336ca7e5e99a1483b5a851bdec1da56e652d8
MD5 70aaec3bb292c5ef71f1271a41a20c3d
BLAKE2b-256 f36e857539369a1377a70ea893f91af1d5a7303b396d43909eabf786c1f9bebe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page