claude-code-llm-router

Multi-LLM router MCP server for Claude Code — smart complexity routing, Claude subscription monitoring, Codex integration, 20+ providers

These details have not been verified by PyPI

Project links

Project description

LLM Router

Route every AI call to the cheapest model that can do the job well. 48 tools · 20+ providers · personal routing memory · budget caps, dashboards, traces.

Average savings: 60–80% vs running everything on Claude Opus.

Install

pipx install claude-code-llm-router && llm-router install

Host	Command
Claude Code	`llm-router install`
VS Code	`llm-router install --host vscode`
Cursor	`llm-router install --host cursor`
Codex CLI	`llm-router install --host codex`

What It Does

Intercepts prompts and routes them to the cheapest model that can handle the task. Most AI sessions are full of low-value work: file lookups, small edits, quick questions. Those burn through expensive models unnecessarily.

llm-router keeps cheap work on cheap/free models, escalates to premium models only when needed. No micromanagement required.

Works in: Claude Code, Cursor, VS Code, Codex, Windsurf, Zed, claw-code, Agno
Free-first: Ollama (local) → Codex → Gemini Flash → OpenAI → Claude (subscription)

Mental Model

Think of llm-router as a smart task dispatcher. When you ask a question:

Analyze — What kind of task is this? (simple lookup vs. complex reasoning)
Choose — Which model can handle this best and cheapest?
Check Constraints — Are we over budget? Is this model degraded?
Execute — Send to that model

The dispatcher learns over time: if a model starts performing poorly (judge scores drop), it gets demoted in future decisions. If you're running low on quota (budget pressure), it automatically uses cheaper models. You don't manage any of this—it just happens behind the scenes.

Example: "Explain this error message" → Simple task → Route to Haiku (fast, cheap) → Done. vs. "Refactor this complex architecture" → Complex task → Route to Opus (expensive but thorough) → Done.

The savings come from not using Opus for every question.

New in v6.4 — Quality Guard

Judge-based quality feedback integrated into routing decisions
Quality reordering — models demoted if scores drop below threshold
Hard floor enforcement — poor-performing models automatically escalated to better tier

See CHANGELOG.md for all changes.

New in v6.3 — Three-Layer Compression

RTK command compression — bash output filtered (60–90% reduction)
Model-based routing — existing cost reduction (70–90%)
Response compression — LLM outputs condensed (60–75% reduction)
Unified dashboard — llm_gain shows all layers

How It Works

User Prompt
    ↓
[Complexity Classifier] — Haiku/Sonnet/Opus?
    ↓
[Free-First Router] — Ollama → Codex → Gemini Flash → OpenAI → Claude
    ↓
[Budget Pressure Check] — Downshift if over 85% budget
    ↓
[Quality Guard] — Demote if judge score < 0.6
    ↓
Selected Model → Execute

Configuration

Zero-config by default if you use Claude Code Pro/Max (subscription mode).

Optional env vars:

OPENAI_API_KEY=sk-...                   # GPT-4o, o3
GEMINI_API_KEY=AIza...                  # Gemini Flash (free tier)
OLLAMA_BASE_URL=http://localhost:11434  # Local Ollama (free)
LLM_ROUTER_PROFILE=balanced             # budget|balanced|premium
LLM_ROUTER_COMPRESS_RESPONSE=true       # Enable response compression

For full setup guide, see docs/SETUP.md.

MCP Tools (48 total)

Routing:

llm_route — Route task to optimal model
llm_classify — Classify task complexity
llm_quality_guard — Monitor model health

Text:

llm_query, llm_research, llm_generate, llm_analyze, llm_code

Media:

llm_image, llm_video, llm_audio

Admin:

llm_usage, llm_savings, llm_budget, llm_health, llm_providers

Advanced:

llm_orchestrate — Multi-step pipelines
llm_setup — Configure provider keys
llm_policy — Routing policy management

Full tool reference — Complete documentation for all 48 tools

Architecture

See CLAUDE.md for:

Design decisions
Module organization
Development workflow
Release process

See docs/ARCHITECTURE.md for:

Three-layer compression pipeline
Judge scoring system
Quality trend tracking
Budget pressure algorithm

Development

uv run pytest tests/ -q          # Run tests
uv run ruff check src/ tests/    # Lint
uv run llm-router --version      # Check version

License

MIT — See LICENSE

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Releases: PyPI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

7.6.1

Apr 27, 2026

7.6.0

Apr 27, 2026

7.5.2

Apr 26, 2026

7.5.1

Apr 26, 2026

7.5.0

Apr 24, 2026

7.4.3

Apr 24, 2026

7.4.2

Apr 23, 2026

7.4.1

Apr 22, 2026

7.4.0

Apr 22, 2026

7.3.0

Apr 22, 2026

7.2.0

Apr 22, 2026

7.1.0

Apr 21, 2026

7.0.0

Apr 21, 2026

6.12.0

Apr 21, 2026

6.11.2

Apr 21, 2026

6.11.0

Apr 21, 2026

6.10.0

Apr 21, 2026

6.9.0

Apr 21, 2026

6.8.1

Apr 20, 2026

This version

6.8.0

Apr 20, 2026

6.7.0

Apr 20, 2026

6.6.0

Apr 20, 2026

6.5.0

Apr 20, 2026

6.4.0

Apr 20, 2026

6.3.1

Apr 19, 2026

6.3.0

Apr 19, 2026

6.2.1

Apr 19, 2026

6.2.0

Apr 19, 2026

6.1.1

Apr 17, 2026

6.1.0

Apr 17, 2026

6.0.5

Apr 17, 2026

6.0.0

Apr 16, 2026

5.9.1

Apr 16, 2026

5.9.0

Apr 16, 2026

5.8.0

Apr 16, 2026

5.6.1

Apr 15, 2026

5.6.0

Apr 15, 2026

5.5.1

Apr 15, 2026

5.5.0

Apr 15, 2026

5.4.0

Apr 15, 2026

5.3.2

Apr 15, 2026

5.3.1

Apr 15, 2026

5.3.0

Apr 15, 2026

5.2.0

Apr 14, 2026

5.1.0

Apr 14, 2026

5.0.0

Apr 14, 2026

4.2.0

Apr 13, 2026

4.1.1

Apr 13, 2026

4.1.0

Apr 13, 2026

4.0.5

Apr 13, 2026

4.0.3

Apr 13, 2026

4.0.2

Apr 12, 2026

4.0.1

Apr 12, 2026

4.0.0

Apr 10, 2026

3.6.0

Apr 10, 2026

3.5.0

Apr 10, 2026

3.4.0

Apr 10, 2026

3.3.0

Apr 10, 2026

3.2.1

Apr 9, 2026

3.2.0

Apr 9, 2026

3.1.0

Apr 9, 2026

3.0.0

Apr 8, 2026

2.6.0

Apr 8, 2026

2.5.0

Apr 8, 2026

2.2.0

Apr 8, 2026

2.0.2

Apr 7, 2026

2.0.1

Apr 7, 2026

2.0.0

Apr 7, 2026

1.9.4

Apr 7, 2026

1.9.3

Apr 7, 2026

1.9.2

Apr 7, 2026

1.9.1

Apr 6, 2026

1.9.0

Apr 6, 2026

1.8.5

Apr 6, 2026

1.8.4

Apr 6, 2026

1.8.3

Apr 6, 2026

1.8.2

Apr 6, 2026

1.8.1

Apr 6, 2026

1.8.0

Apr 6, 2026

1.7.0

Apr 6, 2026

1.6.0

Apr 6, 2026

1.5.2

Apr 6, 2026

1.5.1

Apr 6, 2026

1.5.0

Apr 6, 2026

1.4.2

Apr 6, 2026

1.4.1

Apr 5, 2026

1.4.0

Apr 5, 2026

1.3.9

Apr 5, 2026

1.3.8

Apr 5, 2026

1.3.7

Apr 5, 2026

1.3.6

Apr 5, 2026

1.3.5

Apr 5, 2026

1.3.4

Apr 5, 2026

1.3.2

Apr 5, 2026

1.3.1

Apr 5, 2026

1.0.0

Apr 28, 2026

0.5.2

Mar 30, 2026

0.5.1

Mar 30, 2026

0.5.0

Mar 30, 2026

0.2.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_code_llm_router-6.8.0.tar.gz (612.9 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claude_code_llm_router-6.8.0-py3-none-any.whl (453.7 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file claude_code_llm_router-6.8.0.tar.gz.

File metadata

Download URL: claude_code_llm_router-6.8.0.tar.gz
Upload date: Apr 20, 2026
Size: 612.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.3

File hashes

Hashes for claude_code_llm_router-6.8.0.tar.gz
Algorithm	Hash digest
SHA256	`aeb153ff4843343245ae86c01f0c245f097ccb924b0b3153cbc1e5ae513ae8d5`
MD5	`48fded590a0f751eb3850c951733a8c7`
BLAKE2b-256	`01f89d80f10b63ca4c0b9f49e75c4a9f42fb94501b3f7f2509fb69aeb2a184a1`

See more details on using hashes here.

File details

Details for the file claude_code_llm_router-6.8.0-py3-none-any.whl.

File metadata

Download URL: claude_code_llm_router-6.8.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 453.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.3

File hashes

Hashes for claude_code_llm_router-6.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89eb3b219b35ebccbaec5d0e67481e53c0aa764c1a6c42c1ddaa2c87f6872b1a`
MD5	`f589e915f070e5813c7e0515c3631950`
BLAKE2b-256	`b60f218feccc0ced25f2f1021d01ccd8c3e4a8ddd4126e737e640b96cfe491ba`

See more details on using hashes here.

claude-code-llm-router 6.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

What It Does

Mental Model

New in v6.4 — Quality Guard

New in v6.3 — Three-Layer Compression

How It Works

Configuration

MCP Tools (48 total)

Architecture

Development

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes