Drop-in LLM cost router: classify each prompt by task class and route it to the cheapest sufficient model. Shows the dollars saved on every call.
Project description
routecut
A drop-in LLM cost router. It classifies each prompt by task class and routes it to the cheapest model that's still sufficient, then shows you the dollars saved on every call. Cut the "route everything to one premium model" bill ~50%+ without hand-building a router — and prove the savings.
Most LLM traffic gets sent to one frontier model "to be safe," but most of that
traffic is routine (drafts, formatting, simple tool-call filling). routecut
classifies each prompt, picks the cheapest sufficient model under a policy you
can read and edit, falls back/escalates when it's unsure, and accounts the cost
saved versus a premium baseline — per call.
Why
OpenRouter and LiteLLM are the transport (call many models through one API);
you still pick the model. routecut is the decision: classify the prompt,
choose the cheapest-sufficient model, and make the savings visible. It can even
ride on top of a unified API. Local-first, BYO keys, transparent policy.
Install
pip install -e . # core, zero required deps
pip install -e ".[extras]" # + tiktoken (accurate token counts) + rich
Python 3.11+.
Quick start
from routecut import Router
router = Router.from_config() # routes.toml + pricing.toml, BYO keys via env
resp = router.chat(messages=[{"role": "user", "content": "translate hello to French"}])
print(resp.text)
print(resp.routing.model) # e.g. "qwen-turbo" — the cheap model it chose
print(resp.routing.saved_usd) # $ saved vs the premium baseline, this call
It speaks the OpenAI chat shape and routes to OpenAI, DeepSeek, Qwen (DashScope), Moonshot, MiniMax, and any OpenAI-compatible endpoint by provider prefix — set the keys for whichever providers your routes use.
See it save money (no API key needed)
python examples/demo_savings.py
Routes a realistic mix of prompts (drafts, a tool call, hard reasoning) through a fake provider and prints which model each went to and the running savings vs always-premium. Then:
routecut savings # total spent vs baseline, savings %, by class + provider
routecut calls # recent routed calls with cost + $ saved
How it routes
- Classify the prompt:
draft | tool_use | reasoning | code_explore | code_plan | code_execute, each with a confidence (transparent heuristics; no model in the hot path). - Decide via
routes.toml: pick the cheapest model the policy allows for that class. If confidence is belowmin_confidence, escalate one tier up (bias errors toward "spent a bit more", never "worse answer"). - Call the chosen model. On error or a cheap quality signal (empty output, bare refusal), fall back down the route's chain.
- Account:
saved_usd = baseline_cost - actual_cost, using the same token counts and pricing table for both. Reported honestly — it can be negative if a call was escalated to/above the baseline.
Configure (data, not code)
routes.toml — the routing policy. Edit freely; no release needed:
baseline_model = "claude-opus-4.1" # what savings are measured against
[policy]
min_confidence = 0.6 # below this, escalate one tier up
[route.draft]
model = "qwen-turbo"
fallback = "gpt-4o-mini"
[route.reasoning]
model = "claude-sonnet"
escalate_to = "claude-opus-4.1" # used when classifier confidence is low
pricing.toml — per-model $/1M tokens (also ROUTECUT_PRICING=/path). Unknown
models fall back to a high default so savings never look better than reality.
CLI
| Command | What it does |
|---|---|
routecut savings |
total spent vs baseline, savings %, by class + provider |
routecut calls |
recent routed calls (class, model, cost, $ saved) |
routecut policy |
print routes + pricing; validate every routed model is priced |
routecut doctor |
check provider keys present + pricing coverage |
Honest limits (status: MVP)
- Classification is heuristic v1. It emits a confidence and escalates when
unsure, but it can mis-route. The fallback/escalation path is the safety net;
tune
routes.tomlandmin_confidenceto your workload. A tiny-model classifier for ambiguous cases is planned, not shipped. - "Sufficient" is not yet a measured quality score. v1 optimizes cost under a policy you control and catches gross failures (empty/refusal) for fallback. A real quality eval is a separate, later scope — we don't overclaim it.
- Not built yet: the LangGraph
ConditionalEdgepreset, the coding-agent gateway preset, and the HTTP proxy for non-Python stacks. The SDK ships first. See../pain-radar/specs/05-llm-cost-router/spec.md.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file routecut-0.1.0.tar.gz.
File metadata
- Download URL: routecut-0.1.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e369599df22f095b0725d73b178251e71808bba6ab607bb5e72468c638169fc
|
|
| MD5 |
77cc233e560fbc954564e7a79d508252
|
|
| BLAKE2b-256 |
33ef7f7cf97b08718f7d94040010d89de3c7b7bb3a0b7562a59ac2db3dd0fc86
|
File details
Details for the file routecut-0.1.0-py3-none-any.whl.
File metadata
- Download URL: routecut-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f96cfa86656a9beedbaf4e192488fd01d9d6280385ec26bcdf5863cda36aa6d6
|
|
| MD5 |
6f539fa981e374cbd33a870662bc3e3e
|
|
| BLAKE2b-256 |
8b5be9a2d2562a91a13006b7ce7e23496f12da6a06589715a402974b95de966d
|