Multi-LLM router MCP server for Claude Code — smart complexity routing, Claude subscription monitoring, Codex integration, 20+ providers

These details have not been verified by PyPI

Project links

Project description

LLM Router

One MCP server. Every AI model. Smart routing.

Route text, code, image, video, and audio tasks to 20+ providers — automatically picking the right model based on task complexity and your budget. Works in Claude Code, Cursor, Windsurf, Zed, claw-code, and Agno.

Why

Not every task needs the same model. Without a router, everything goes to the same expensive frontier model — like hiring a surgeon to change a lightbulb.

Task	Without router	With router	Savings
Simple queries (60% of work)	Opus — $0.015	Haiku / Gemini Flash — $0.0001	99%
Moderate tasks (30% of work)	Opus — $0.015	Sonnet — $0.003	80%
Complex tasks (10% of work)	Opus — $0.015	Opus — $0.015	0%
Blended monthly	~$50/mo	~$8–15/mo	70–85%

With Ollama: simple tasks route to a free local model — those 60% of queries cost $0.

Quick Start

pipx install claude-code-llm-router && llm-router install

That's it. The installer registers the MCP server and installs hooks into ~/.claude/ so every prompt is evaluated automatically.

Zero API keys required if you have a Claude Code Pro/Max subscription. Add GEMINI_API_KEY for a free external fallback (1M tokens/day free tier).

# Optional: add providers in .env
GEMINI_API_KEY=AIza...      # free tier
OPENAI_API_KEY=sk-proj-...
PERPLEXITY_API_KEY=pplx-...

# If you use Claude Code subscription
LLM_ROUTER_CLAUDE_SUBSCRIPTION=true

Enforcement modes — control how strictly routing is applied:

Mode	Behaviour	Set via
`smart` (default)	Hard block for Q&A tasks (query/research/generate/analyze), soft for code tasks	`LLM_ROUTER_ENFORCE=smart`
`soft`	Route hints in context, never blocks — lowest friction	`LLM_ROUTER_ENFORCE=soft`
`hard`	Blocks all Bash/Edit/Write until `llm_*` tool called — maximum savings	`LLM_ROUTER_ENFORCE=hard`
`off`	Enforcement disabled entirely	`LLM_ROUTER_ENFORCE=off`

Switch mode instantly with the CLI:

llm-router set-enforce smart   # (default) smart balance
llm-router set-enforce hard    # maximum cost savings
llm-router set-enforce soft    # no blocking

Set permanently in your .env or ~/.llm-router/routing.yaml:

# ~/.llm-router/routing.yaml
enforce: smart   # smart | soft | hard | off

How It Works

Every prompt is intercepted by a UserPromptSubmit hook before your top-tier model sees it:

0. Context inherit      instant, free    "yes/ok/go ahead" reuse prior turn's route
1. Heuristic scoring    instant, free    high-confidence patterns route immediately
2. Ollama local LLM     free, ~1s        catches what heuristics miss
3. Cheap API            ~$0.0001         Gemini Flash / GPT-4o-mini fallback

Prompt	Classified as	Routed to
"What does os.path.join do?"	query/simple	Gemini Flash ($0.000001)
"Fix the bug in auth.py"	code/moderate	Haiku / Sonnet
"Design the full auth system"	code/complex	Sonnet / Opus
"Research latest AI funding"	research	Perplexity Sonar Pro
"Generate a hero image"	image	Flux Pro via fal.ai

Free-first chain (subscription mode): Ollama → Codex (free via OpenAI sub) → paid API

MCP Tools

34 tools across 6 categories:

Smart Routing

Tool	What it does
`llm_route`	Auto-classify prompt → route to best model
`llm_classify`	Classify complexity + recommend model
`llm_select_agent`	Pick agent CLI (claude_code / codex) + model for a session
`llm_stream`	Stream LLM response for long-running tasks

Text & Code

Tool	What it does
`llm_query`	General questions — routed to cheapest capable model
`llm_research`	Web-grounded answers via Perplexity Sonar
`llm_generate`	Creative writing, summaries, brainstorming
`llm_analyze`	Deep reasoning — analysis, debugging, design review
`llm_code`	Code generation, refactoring, algorithms
`llm_edit`	Route edit reasoning to cheap model → returns `{file, old, new}` patch pairs

Filesystem

Tool	What it does
`llm_fs_find`	Describe files to find → cheap model returns glob/grep commands
`llm_fs_rename`	Describe a rename → returns `mv`/`git mv` commands (dry_run by default)
`llm_fs_edit_many`	Bulk edits across files → returns all patch pairs

Media

Tool	What it does
`llm_image`	Image generation — Flux, DALL-E, Gemini Imagen
`llm_video`	Video generation — Runway, Kling, Veo 2
`llm_audio`	TTS/voice — ElevenLabs, OpenAI

Orchestration

Tool	What it does
`llm_orchestrate`	Multi-step pipeline across multiple models
`llm_pipeline_templates`	List available pipeline templates

Monitoring & Admin

Tool	What it does
`llm_usage`	Unified dashboard — Claude sub, Codex, APIs, savings
`llm_check_usage`	Live Claude subscription usage (session %, weekly %)
`llm_health`	Provider availability + circuit breaker status
`llm_providers`	List all configured providers and models
`llm_set_profile`	Switch profile: `budget` / `balanced` / `premium`
`llm_setup`	Interactive provider wizard — add keys, validate, install hooks
`llm_quality_report`	Routing accuracy, savings metrics, classifier stats
`llm_rate`	Rate last response 👍/👎 — logged for quality tracking
`llm_codex`	Route task to local Codex desktop agent (free)
`llm_save_session`	Persist session summary for cross-session context
`llm_cache_stats`	Cache hit rate, entries, evictions
`llm_cache_clear`	Clear classification cache
`llm_refresh_claude_usage`	Force-refresh subscription data via OAuth
`llm_update_usage`	Feed usage data from claude.ai into the router
`llm_track_usage`	Report Claude Code token usage for budget tracking
`llm_dashboard`	Open web dashboard at localhost:7337

Routing Profiles

Three profiles — switch anytime with llm_set_profile:

Profile	Use case	Chain
`budget`	Dev, drafts, exploration	Ollama → Haiku → Gemini Flash
`balanced`	Production work (default)	Codex → Sonnet → GPT-4o
`premium`	Critical tasks, max quality	Codex → Opus → o3

Profile is overridden by complexity: simple prompts always use the budget chain, complex ones escalate to premium, regardless of the active profile setting.

Providers

Provider	Models	Free tier	Best for
Ollama	Any local model	Yes (forever)	Privacy, zero cost, offline
Google Gemini	2.5 Flash, 2.5 Pro	Yes (1M tokens/day)	Generation, long context
Groq	Llama 3.3, Mixtral	Yes	Ultra-fast inference
OpenAI	GPT-4o, o3, DALL-E	No	Code, reasoning, images
Perplexity	Sonar, Sonar Pro	No	Research, current events
Anthropic	Haiku, Sonnet, Opus	No	Writing, analysis, safety
DeepSeek	V3, Reasoner	Limited	Cost-effective reasoning
Mistral	Large, Small	Limited	Multilingual
fal.ai	Flux, Kling, Veo	No	Images, video, audio
ElevenLabs	Voice models	Limited	High-quality TTS
Runway	Gen-3	No	Professional video

Full setup guides: docs/PROVIDERS.md

Works With

Claude Code

Auto-installed by llm-router install. Hooks intercept every prompt — you never need to call tools manually unless you want explicit control.

pipx install claude-code-llm-router && llm-router install

Live status bar shows routing stats before every prompt and in the persistent bottom statusline:

📊  CC 13%s · 24%w  │  sub:0 · free:305 · paid:27  │  $1.59 saved (35%)

claw-code

Add to ~/.claw-code/mcp.json:

{
  "mcpServers": {
    "llm-router": { "command": "llm-router", "args": [] }
  }
}

Every API call in claw-code is paid — the free-first chain (Ollama → Codex → Gemini Flash) saves more here than in Claude Code.

Cursor / Windsurf / Zed

Add to your IDE's MCP config:

{
  "mcpServers": {
    "llm-router": { "command": "llm-router", "args": [] }
  }
}

Agno (multi-agent)

Two integration modes:

Option 1 — RouteredModel (v2.0+): use llm-router as a first-class Agno model. Every agent call is automatically routed to the cheapest capable provider.

pip install "claude-code-llm-router[agno]"

from agno.agent import Agent
from llm_router.integrations.agno import RouteredModel, RouteredTeam

# Single agent — routes each call intelligently
coder = Agent(
    model=RouteredModel(task_type="code", profile="balanced"),
    instructions="You are a coding assistant.",
)
coder.print_response("Write a Python quicksort.")

# Multi-agent team with shared $20/month budget cap
# Automatically downshifts to 'budget' profile at 80% spend
team = RouteredTeam(
    members=[coder, researcher],
    monthly_budget_usd=20.0,
    downshift_at=0.80,
)

Option 2 — MCP tools: use llm-router's 34 tools in any Agno agent:

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.tools.mcp import MCPTools

agent = Agent(
    model=Claude(id="claude-sonnet-4-6"),
    tools=[MCPTools(command="llm-router")],
    instructions="Use llm_research for web searches, llm_code for coding tasks.",
)

Docker / CI

RUN pip install claude-code-llm-router && llm-router install --headless
# Pass keys at runtime: docker run -e GEMINI_API_KEY=... your-image

Configuration

# API keys — at least one required
GEMINI_API_KEY=AIza...              # free tier at aistudio.google.com
OPENAI_API_KEY=sk-proj-...
PERPLEXITY_API_KEY=pplx-...
ANTHROPIC_API_KEY=sk-ant-...        # skip if using Claude Code subscription
DEEPSEEK_API_KEY=...
GROQ_API_KEY=gsk_...
FAL_KEY=...                         # images, video, audio via fal.ai
ELEVENLABS_API_KEY=...

# Router
LLM_ROUTER_PROFILE=balanced         # budget | balanced | premium
LLM_ROUTER_MONTHLY_BUDGET=0         # USD, 0 = unlimited
LLM_ROUTER_CLAUDE_SUBSCRIPTION=false  # true = Claude Code Pro/Max
LLM_ROUTER_ENFORCE=enforce          # shadow | suggest | enforce (default: enforce)

# Ollama (local models)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_BUDGET_MODELS=gemma4:latest,qwen3.5:latest

# Spend limits
LLM_ROUTER_DAILY_SPEND_LIMIT=5.00   # USD, 0 = disabled

Repo-level config (`.llm-router.yml`)

Commit a routing policy alongside your code — no env vars required:

profile: balanced
enforce: suggest          # shadow | suggest | enforce
block_providers:
  - openai                # never use OpenAI in this repo

routing:
  code:
    model: ollama/qwen3.5:latest   # always use local model for code tasks
  research:
    provider: perplexity           # always use Perplexity for research

daily_caps:
  _total: 2.00            # global $2/day cap
  code: 0.50              # code tasks capped at $0.50/day

User-level overrides live in ~/.llm-router/routing.yaml (same schema). Repo config wins.

Full reference: .env.example

Budget Control

LLM_ROUTER_MONTHLY_BUDGET=50   # raises BudgetExceededError when exceeded

llm_usage("month")
→ Calls: 142 | Tokens: 320k | Cost: $3.42 | Budget: 6.8% of $50

The router tracks spend in SQLite across all providers and blocks calls when the monthly cap is reached.

Dashboard

llm-router dashboard   # opens localhost:7337

Live view of routing decisions, cost trends, model distribution, and subscription pressure. Auto-refreshes every 30s.

Session Summary

At session end the router prints a breakdown:

  Free models  305 calls  ·  $0.52 saved  (Ollama / Codex)
  External       27 calls  ·  $0.006       (Gemini Flash, GPT-4o)
  💡 Saved ~$0.53 this session

Share your savings:

llm-router share   # copies savings card to clipboard + opens tweet

Roadmap

Positioning: Claude Code's cost autopilot. Stop paying Opus prices for Haiku work.

Phase 1 — Trust & Proof (Apr–Jun 2026)

Version	Headline	Status
v1.3–v2.0	Foundation, dashboard, enforcement, Agno adapter	✅ Done
v2.1	Route Simulator — `llm-router test "<prompt>"` dry-run + `llm_savings` dashboard	✅ Done
v2.2	Explainable Routing — `LLM_ROUTER_EXPLAIN=1`, "why not Opus?", per-decision reasoning	✅ Done
v2.3	Zero-Friction Activation — onboarding wizard, shadow/suggest/enforce modes, yearly savings projection	✅ Done

Phase 2 — Smarter Routing (Jun–Aug 2026)

Version	Headline	Status
v2.4	Repo-Aware YAML Config — `.llm-router.yml` committed with the codebase, block_providers, model pins	✅ Done
v2.5	Context-Aware Routing — "yes/ok/go ahead" inherits prior turn's route, zero classifier latency	✅ Done
v2.6	Latency + Personalized Routing — p95 latency scoring, per-user acceptance signals	📅 Aug 2026

Phase 3 — Team Infrastructure (Sep–Nov 2026)

Version	Headline	Status
v3.0	Team Dashboard — shared savings across the whole team	📅 Sep 2026
v3.1	Policy Engine — org/project/user routing policy, spend caps, audit log	📅 Oct 2026
v3.2	Slack Digests — weekly savings summary, spend-spike alerts	📅 Nov 2026

Phase 4 — Category Leadership (Jan–Apr 2027)

Version	Headline	Status
v3.3	Community Benchmarks — opt-in anonymous routing quality leaderboard	📅 Jan 2027
v3.5	Claude Desktop + Co-Work — tool-based delegation, per-user savings attribution	📅 Mar 2027
v4.0	VS Code + Cursor GA — cross-editor routing, shared config and analytics	📅 Apr 2027

Full details: ROADMAP.md

Development

uv sync --extra dev
uv run pytest tests/ -q --ignore=tests/test_integration.py
uv run ruff check src/ tests/

See CLAUDE.md for architecture and module layout.

Contributing

See CONTRIBUTING.md. Key areas: new provider integrations, routing intelligence, MCP client testing.

License

MIT

_{Built with LiteLLM and MCP}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.9.1

Apr 16, 2026

5.9.0

Apr 16, 2026

5.8.0

Apr 16, 2026

5.6.1

Apr 15, 2026

5.6.0

Apr 15, 2026

5.5.1

Apr 15, 2026

5.5.0

Apr 15, 2026

5.4.0

Apr 15, 2026

5.3.2

Apr 15, 2026

5.3.1

Apr 15, 2026

5.3.0

Apr 15, 2026

5.2.0

Apr 14, 2026

5.1.0

Apr 14, 2026

5.0.0

Apr 14, 2026

4.2.0

Apr 13, 2026

4.1.1

Apr 13, 2026

4.1.0

Apr 13, 2026

4.0.5

Apr 13, 2026

4.0.3

Apr 13, 2026

4.0.2

Apr 12, 2026

4.0.1

Apr 12, 2026

4.0.0

Apr 10, 2026

3.6.0

Apr 10, 2026

3.5.0

Apr 10, 2026

3.4.0

Apr 10, 2026

3.3.0

Apr 10, 2026

3.2.1

Apr 9, 2026

3.2.0

Apr 9, 2026

3.1.0

Apr 9, 2026

3.0.0

Apr 8, 2026

This version

2.6.0

Apr 8, 2026

2.5.0

Apr 8, 2026

2.2.0

Apr 8, 2026

2.0.2

Apr 7, 2026

2.0.1

Apr 7, 2026

2.0.0

Apr 7, 2026

1.9.4

Apr 7, 2026

1.9.3

Apr 7, 2026

1.9.2

Apr 7, 2026

1.9.1

Apr 6, 2026

1.9.0

Apr 6, 2026

1.8.5

Apr 6, 2026

1.8.4

Apr 6, 2026

1.8.3

Apr 6, 2026

1.8.2

Apr 6, 2026

1.8.1

Apr 6, 2026

1.8.0

Apr 6, 2026

1.7.0

Apr 6, 2026

1.6.0

Apr 6, 2026

1.5.2

Apr 6, 2026

1.5.1

Apr 6, 2026

1.5.0

Apr 6, 2026

1.4.2

Apr 6, 2026

1.4.1

Apr 5, 2026

1.4.0

Apr 5, 2026

1.3.9

Apr 5, 2026

1.3.8

Apr 5, 2026

1.3.7

Apr 5, 2026

1.3.6

Apr 5, 2026

1.3.5

Apr 5, 2026

1.3.4

Apr 5, 2026

1.3.2

Apr 5, 2026

1.3.1

Apr 5, 2026

0.5.2

Mar 30, 2026

0.5.1

Mar 30, 2026

0.5.0

Mar 30, 2026

0.2.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_code_llm_router-2.6.0.tar.gz (400.3 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claude_code_llm_router-2.6.0-py3-none-any.whl (258.0 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file claude_code_llm_router-2.6.0.tar.gz.

File metadata

Download URL: claude_code_llm_router-2.6.0.tar.gz
Upload date: Apr 8, 2026
Size: 400.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.3

File hashes

Hashes for claude_code_llm_router-2.6.0.tar.gz
Algorithm	Hash digest
SHA256	`45f7c69dd143620634b932a2ca82535bc69e82e51ff09292a6c2233f7ce3bbc0`
MD5	`a6dd86b4e15c4cf6b0a7311179dad7c9`
BLAKE2b-256	`f23c8b2a91151fa82cd98261946bb9d8f77658c0970591de0fa3c03a243d272f`

See more details on using hashes here.

File details

Details for the file claude_code_llm_router-2.6.0-py3-none-any.whl.

File metadata

Download URL: claude_code_llm_router-2.6.0-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 258.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.3

File hashes

Hashes for claude_code_llm_router-2.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0855d261da586d3318dd267ea6d2abc7d4969eaaa73a0ff1ace9614f8ee44949`
MD5	`75a5e36c9495a7d660e04d4737eaf891`
BLAKE2b-256	`103ec72338ec66bd651a1b418bab972bf0a11617084f8e6199154e159df47cad`

See more details on using hashes here.

claude-code-llm-router 2.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM Router

Why

Quick Start

How It Works

MCP Tools

Smart Routing

Text & Code

Filesystem

Media

Orchestration

Monitoring & Admin

Routing Profiles

Providers

Works With

Claude Code

claw-code

Cursor / Windsurf / Zed

Agno (multi-agent)

Docker / CI

Configuration

Repo-level config (.llm-router.yml)

Budget Control

Dashboard

Session Summary

Roadmap

Phase 1 — Trust & Proof (Apr–Jun 2026)

Phase 2 — Smarter Routing (Jun–Aug 2026)

Phase 3 — Team Infrastructure (Sep–Nov 2026)

Phase 4 — Category Leadership (Jan–Apr 2027)

Development

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Repo-level config (`.llm-router.yml`)