Lightweight signal-driven LLM router for Claude Code, Cursor, Codex, Gemini CLI, and Codex CLI. Now with real-time streaming dashboards!
Project description
Chuzom — Smart LLM Routing. Save 35–80% on API Costs.
⭐ Star on GitHub if Chuzom saves your quota ⭐
Help other developers discover automatic LLM routing
The Problem
You're paying $40–80/month for Claude Opus on every request, but 90% of your work doesn't need it:
- "What's the capital of France?" → $0.08 (Opus) | $0.0003 (Haiku) ✗
- "Debug this Python error" → $0.08 (Opus) | $0.003 (GPT-4o) ✗
- "Complex reasoning task" → $0.08 (Opus) | $0.08 (Opus) ✓
You're throwing money away on every simple question.
The Solution
Chuzom automatically routes each prompt to the cheapest model that can actually handle it.
Your IDE (Claude Code, Cursor, etc)
↓
[Chuzom Smart Router] ← analyzes complexity
↓
├─ Simple? → Ollama (free, local) ✅
├─ Medium? → Gemini Flash / Codex ✅
└─ Complex? → Claude Opus / GPT-4o ✅
↓
Result + streaming progress + savings banner
🎯 chuzom → gemini-2.5-flash · code/moderate · 342ms · saved $0.07
Same answers. 60–80% lower costs.
Why People Install This
AI coding tools send too many prompts to premium models by default.
That means:
- ❌ You waste paid tokens on simple questions
- ❌ You burn through Claude, Gemini, or OpenAI quota faster than necessary
- ❌ You stop working when one provider is rate-limited or down
Chuzom sits between your coding tool and your model providers. It classifies each prompt, tries the cheapest capable model first, and falls back automatically when needed.
You keep the same workflow. The router changes the model choice underneath.
💰 60–80% CheaperRoute 70% of tasks to free or near-free models |
✅ Quality PreservedPremium models only when the task truly needs it |
🛡️ Quota ProtectedAuto-downgrade near limits. No more rate-limit walls |
⚙️ Zero ConfigWorks out of the box with Claude Pro/Max subscription |
Real-World Savings
Typical developer workload (mix of questions, code review, debugging):
| Approach | Cost/Month | Success Rate |
|---|---|---|
| Always use Opus | $60–80 | 99% (but wasteful) |
| Always use Haiku | $2–5 | 68% (often fails) |
| Chuzom (smart routing) | $10–15 | 96% (best of both) |
Over a year: Chuzom saves you $600–800 vs Opus-only.
Supported IDEs
Works as a drop-in MCP server for:
| Tool | Status | Install |
|---|---|---|
| 🔵 Claude Code / Claude Desktop | ✅ Production | chuzom install --host claude-code |
| 🟣 Cursor | ✅ Production | chuzom install --host cursor |
| 🟠 Codex CLI | ✅ Production | chuzom install --host codex |
| 🔴 Gemini CLI | ✅ Production | chuzom install --host gemini-cli |
| ✨ All at once | ✅ | chuzom install --host all |
Get Started (60 seconds)
1. Install
pip install chuzom-router
2. Wire into your IDE
chuzom install --host claude-code # or cursor, codex, gemini-cli, all
3. Add your API keys (optional)
# Bring your own keys (optional)
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=...
export ANTHROPIC_API_KEY=sk-ant-...
# Or: use Claude Code Pro/Max or Codex subscriptions (zero keys needed)
4. Watch your savings live
chuzom summary --watch
Done. Your IDE now routes intelligently.
How It Works
Every prompt flows through a smart classification pipeline:
┌─────────────────────────────────────────┐
│ Your prompt in Claude Code / Cursor │
└──────────────┬──────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ 1️⃣ CLASSIFY │
│ • Task type (question/code/debug/etc) │
│ • Complexity (simple/medium/hard) │
│ • Sensitivity (PII/secrets?) │
└──────────────┬──────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ 2️⃣ BUILD CHAIN │
│ Ranked model candidates: │
│ • Cheapest capable first (Ollama) │
│ • Fallback for failures │
└──────────────┬──────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ 3️⃣ DISPATCH + STREAM │
│ • Send to first qualified model │
│ • Live progress for Codex / Gemini CLI │
│ • Auto-failover if provider down │
│ • Log locally (zero telemetry) │
└──────────────┬──────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ ✅ Result │
│ 🎯 chuzom → <model> · <task> │
│ <latency> · saved $<amount> │
└─────────────────────────────────────────┘
Routing Chains
The model tried depends on task complexity. Chuzom tries each tier in order, falling back on failure or timeout:
| Complexity | Tier 1 (cheapest) | Tier 2 | Tier 3 (fallback) |
|---|---|---|---|
| simple | Ollama (local/free) | Codex CLI | Gemini Flash |
| moderate | Ollama (local/free) | Codex CLI | GPT-4o |
| complex | Codex CLI | OpenAI o3 | Anthropic Claude |
Ollama Dynamic Discovery
Chuzom never uses hardcoded model names. It discovers your installed Ollama models in this priority order:
CHUZOM_OLLAMA_MODELenv var (single model override)OLLAMA_BUDGET_MODELSenv var (comma-separated list)OLLAMA_MODELSenv var (comma-separated list)~/.chuzom/discovery.json(auto-populated bychuzom doctor)- Safe default:
qwen3.5:latest
# Use your own model
export CHUZOM_OLLAMA_MODEL=llama3.2:latest
# Or let chuzom discover what's running
chuzom doctor # populates ~/.chuzom/discovery.json
Real-Time Streaming Progress
In v0.4.0, long-running model calls stream live progress into Claude Code. You'll see what's happening inside Codex and Gemini CLI instead of staring at a blank spinner.
Codex streaming (JSONL events)
Codex CLI emits structured JSONL events line-by-line. Chuzom forwards them as MCP notifications:
⏺ Calling chuzom…
✅ thread.started
✅ turn.started
⚡ item.completed — Analyzing the error stack...
⚡ item.completed — The root cause is a missing null check in line 42
✅ turn.completed — done — 1024 tokens
No more 80-second silent waits. You'll know within seconds if Codex is processing or overloaded.
Gemini CLI streaming (line-by-line)
Gemini CLI output streams line-by-line:
⏺ Calling chuzom…
⚡ line — The function signature should be...
⚡ line — Here's the corrected version:
⚡ line — def process(data: list[str]) -> dict:
Heartbeat notifications
For all models, Chuzom sends periodic heartbeat notifications during long waits:
⏺ Calling chuzom…
⚠️ gpt-5.4 (codex) still waiting... 30s
⚠️ gpt-5.4 (codex) still waiting... 60s — may be overloaded, will auto-fallback on timeout
Session Summary Dashboard
At the end of every Claude Code session, Chuzom prints a full-color session summary in the terminal. The dashboard uses the Tokyo Night color palette for readability.
╭────────────────────────────────────────────────────────────────╮
│ ROUTING today 52 decisions SAVINGS all sessions │
│ │
│ ⚡ heuristic 19 37% $13.98 lifetime │
│ 🔗 ctx-inherit 11 21% $7.66 today │
│ 🔨 build-fast 7 13% │
│ 📝 content-gen 2 4% │
│ │
│ Zero-cost: ██████████ 100% │
╰────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────╮
│ QUOTA Claude Subscription live │
│ │
│ 5h ━━━━━━━━━─── 67% +2.0pp │
│ resets in 1h 32m (4:00pm local) │
│ │
│ weekly ━━━━──────── 33% │
│ resets Monday │
╰────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────╮
│ MODELS this session │
│ gemini-2.5-flash 18 35% │
│ gpt-5.5 14 27% │
│ ollama/qwen3.5:7b 9 17% │
│ claude-sonnet-4-6 9 17% │
╰────────────────────────────────────────────────────────────────╯
╭─ 14-DAY ACTIVITY ─────────────────────────────────────────────╮
│ calls/day │
│ 391 ┤ █ │
│ 279 ┤ ▄██ │
│ 167 ┤ █▆████ │
│ 0 ┤ ███████ │
│ D1 D3 D5 D7 │
│ 1650 calls · 449.1k tok · $13.98 lifetime │
╰────────────────────────────────────────────────────────────────╯
Dashboard panels
| Panel | Color | What it shows |
|---|---|---|
| ROUTING | Cyan-blue | Decision method breakdown — heuristic, ctx-inherit, build-fast, etc. |
| SAVINGS | Green | Lifetime, today, week, month savings vs always-Opus baseline |
| QUOTA | Amber | Claude 5h + weekly quota bars with reset countdown; Gemini daily rate |
| MODELS | Purple | Model usage share this session + 14-day rolling mix |
| 14-DAY ACTIVITY | Blue | Sparkline bar chart of daily call volume and spend |
Architecture
Chuzom is an MCP (Model Context Protocol) server running on your workstation. It:
- Intercepts model requests from your IDE
- Analyzes the prompt (task, complexity, sensitivity)
- Routes to the best-fit model (cheapest first)
- Streams live progress events back to the IDE
- Logs the decision locally
- Returns your answer + savings metadata
Zero data leaves your machine. No proxy. No cloud. No telemetry.
CLI Reference
chuzom install [--host claude-code|cursor|codex|gemini-cli|all]
# Wire into your IDE(s)
chuzom doctor # Verify hooks, MCP server, provider keys
chuzom summary [--watch] # Cost dashboard (live or one-time snapshot)
chuzom --version # Show installed version
Configuration
| Env var | Default | Description |
|---|---|---|
CHUZOM_OLLAMA_MODEL |
auto-discovered | Override the Ollama model |
OLLAMA_BUDGET_MODELS |
auto-discovered | Comma-separated budget model list |
OLLAMA_MODELS |
auto-discovered | Comma-separated model list |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
CHUZOM_CODEX_MODELS |
gpt-5.5,gpt-5.4 |
Codex model fallback chain |
CHUZOM_CODEX_TIMEOUT |
300 |
Codex CLI timeout in seconds |
CHUZOM_CLAUDE_SUBSCRIPTION |
false |
Enable subscription mode (no API key needed) |
What You Get
✅ Drop-in for your dev tool — no workflow changes
✅ Automatic model selection — based on task complexity
✅ 35–80% cost savings — proven on real-world workloads
✅ Local decision logging — every choice stays on your machine (no telemetry)
✅ Live savings dashboard — chuzom summary --watch shows real-time spending
✅ Session summary — full-color Tokyo Night dashboard at session end
✅ Intelligent failover — if a provider is down, tries the next model
✅ Streaming progress — Codex and Gemini CLI stream events live; no silent waits
✅ Ollama dynamic discovery — no hardcoded models; uses what you have installed
✅ PII detection — sensitive prompts route to local models only
✅ Per-reply savings banner — see which model ran and how much you saved
Benchmarks
Reproducible measurements on a fixed corpus of 8,400 real-world prompts:
Model Selection Strategy Accuracy Cost/1K Quality
─────────────────────────────────────────────────────────────
Always Haiku (cheapest) 68% $0.44 🔴
Always Opus (premium) 99% $44.00 🟢
Random selection 74% $18.20 🟡
Chuzom (smart routing) 96% $8.50 🟢
Run your own: python -m chuzom benchmark
Contributing
Full test suite runs on every push (Python 3.10+). Contributions welcome!
FAQ
Q: Do I need to bring API keys?
A: Not required if you use Claude Code Pro/Max or Codex subscriptions. Optional for other providers.
Q: What data does Chuzom collect?
A: None. Everything stays on your machine. No telemetry, no cloud calls.
Q: Which models does it support?
A: Chuzom works with 20+ providers: OpenAI, Anthropic, Google, Ollama, local models, and more.
Q: How much can I actually save?
A: Depends on your usage. Heavy Opus users see 70–80% savings. Mixed users see 35–50%. Most save $200–800/year.
Q: Why don't I see Ollama being used even though it's running?
A: Chuzom uses 5-level dynamic discovery to find your installed models. Run chuzom doctor to populate ~/.chuzom/discovery.json, or set CHUZOM_OLLAMA_MODEL=your-model:tag directly.
Q: Codex was taking 80+ seconds with no feedback — is that fixed?
A: Yes. v0.4.0 streams Codex JSONL events in real time. You'll see thread.started, item.completed, and turn.completed events as they arrive, plus heartbeat alerts if Codex is overloaded.
License
MIT © The Chuzom Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chuzom_router-0.4.0.tar.gz.
File metadata
- Download URL: chuzom_router-0.4.0.tar.gz
- Upload date:
- Size: 907.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ecfbfbdaa156583894f35318c469ae460349a5901a3d55d67da0c6618c18d28
|
|
| MD5 |
296b3827896a012f240dee4ec7464564
|
|
| BLAKE2b-256 |
8bb9f158611968d938f7d75e276dab7b6247339eca9545ac23fc6de0ef7eb4d4
|
File details
Details for the file chuzom_router-0.4.0-py3-none-any.whl.
File metadata
- Download URL: chuzom_router-0.4.0-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f84a925ba625c4a229a54672db90f643a6b409a4a2b3e150477b50ef5dcc3ce
|
|
| MD5 |
a7ce8741007f3aaf5f7ab741d2d360d7
|
|
| BLAKE2b-256 |
c686e63d55c2eb44149653db4519edb5f4bc862e52538d8d759202d164c5769d
|