Drop-in LLM cost router: classify each prompt by task class and route it to the cheapest sufficient model. Shows the dollars saved on every call.

Project description

routecut

A drop-in LLM cost router. It classifies each prompt by task class and routes it to the cheapest model that's still sufficient, then shows you the dollars saved on every call. Cut the "route everything to one premium model" bill ~50%+ without hand-building a router — and prove the savings.

Most LLM traffic gets sent to one frontier model "to be safe," but most of that traffic is routine (drafts, formatting, simple tool-call filling). routecut classifies each prompt, picks the cheapest sufficient model under a policy you can read and edit, falls back/escalates when it's unsure, and accounts the cost saved versus a premium baseline — per call.

Why

OpenRouter and LiteLLM are the transport (call many models through one API); you still pick the model. routecut is the decision: classify the prompt, choose the cheapest-sufficient model, and make the savings visible. It can even ride on top of a unified API. Local-first, BYO keys, transparent policy.

Install

pip install -e .            # core, zero required deps
pip install -e ".[extras]"  # + tiktoken (accurate token counts) + rich

Python 3.11+.

Quick start

from routecut import Router

router = Router.from_config()          # routes.toml + pricing.toml, BYO keys via env
resp = router.chat(messages=[{"role": "user", "content": "translate hello to French"}])

print(resp.text)
print(resp.routing.model)        # e.g. "qwen-turbo" — the cheap model it chose
print(resp.routing.saved_usd)    # $ saved vs the premium baseline, this call

It speaks the OpenAI chat shape and routes to OpenAI, DeepSeek, Qwen (DashScope), Moonshot, MiniMax, and any OpenAI-compatible endpoint by provider prefix — set the keys for whichever providers your routes use.

See it save money (no API key needed)

python examples/demo_savings.py

Routes a realistic mix of prompts (drafts, a tool call, hard reasoning) through a fake provider and prints which model each went to and the running savings vs always-premium. Then:

routecut savings     # total spent vs baseline, savings %, by class + provider
routecut calls       # recent routed calls with cost + $ saved

How it routes

Classify the prompt: draft | tool_use | reasoning | code_explore | code_plan | code_execute, each with a confidence (transparent heuristics; no model in the hot path).
Decide via routes.toml: pick the cheapest model the policy allows for that class. If confidence is below min_confidence, escalate one tier up (bias errors toward "spent a bit more", never "worse answer").
Call the chosen model. On error or a cheap quality signal (empty output, bare refusal), fall back down the route's chain.
Account: saved_usd = baseline_cost - actual_cost, using the same token counts and pricing table for both. Reported honestly — it can be negative if a call was escalated to/above the baseline.

Configure (data, not code)

routes.toml — the routing policy. Edit freely; no release needed:

baseline_model = "claude-opus-4.1"   # what savings are measured against

[policy]
min_confidence = 0.6                  # below this, escalate one tier up

[route.draft]
model = "qwen-turbo"
fallback = "gpt-4o-mini"

[route.reasoning]
model = "claude-sonnet"
escalate_to = "claude-opus-4.1"       # used when classifier confidence is low

pricing.toml — per-model $/1M tokens (also ROUTECUT_PRICING=/path). Unknown models fall back to a high default so savings never look better than reality.

CLI

Command	What it does
`routecut savings`	total spent vs baseline, savings %, by class + provider
`routecut calls`	recent routed calls (class, model, cost, $ saved)
`routecut policy`	print routes + pricing; validate every routed model is priced
`routecut doctor`	check provider keys present + pricing coverage

Honest limits (status: MVP)

Classification is heuristic v1. It emits a confidence and escalates when unsure, but it can mis-route. The fallback/escalation path is the safety net; tune routes.toml and min_confidence to your workload. A tiny-model classifier for ambiguous cases is planned, not shipped.
"Sufficient" is not yet a measured quality score. v1 optimizes cost under a policy you control and catches gross failures (empty/refusal) for fallback. A real quality eval is a separate, later scope — we don't overclaim it.
Not built yet: the LangGraph ConditionalEdge preset, the coding-agent gateway preset, and the HTTP proxy for non-Python stacks. The SDK ships first. See ../pain-radar/specs/05-llm-cost-router/spec.md.

License

MIT.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routecut-0.1.0.tar.gz (21.5 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

routecut-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file routecut-0.1.0.tar.gz.

File metadata

Download URL: routecut-0.1.0.tar.gz
Upload date: Jun 9, 2026
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for routecut-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2e369599df22f095b0725d73b178251e71808bba6ab607bb5e72468c638169fc`
MD5	`77cc233e560fbc954564e7a79d508252`
BLAKE2b-256	`33ef7f7cf97b08718f7d94040010d89de3c7b7bb3a0b7562a59ac2db3dd0fc86`

See more details on using hashes here.

File details

Details for the file routecut-0.1.0-py3-none-any.whl.

File metadata

Download URL: routecut-0.1.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for routecut-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f96cfa86656a9beedbaf4e192488fd01d9d6280385ec26bcdf5863cda36aa6d6`
MD5	`6f539fa981e374cbd33a870662bc3e3e`
BLAKE2b-256	`8b5be9a2d2562a91a13006b7ce7e23496f12da6a06589715a402974b95de966d`

See more details on using hashes here.

routecut 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

routecut

Why

Install

Quick start

See it save money (no API key needed)

How it routes

Configure (data, not code)

CLI

Honest limits (status: MVP)

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes