Skip to main content

Drop-in LLM cost router: classify each prompt by task class and route it to the cheapest sufficient model. Shows the dollars saved on every call.

Project description

routecut

A drop-in LLM cost router. It classifies each prompt by task class and routes it to the cheapest model that's still sufficient, then shows you the dollars saved on every call. Cut the "route everything to one premium model" bill ~50%+ without hand-building a router — and prove the savings.

Most LLM traffic gets sent to one frontier model "to be safe," but most of that traffic is routine (drafts, formatting, simple tool-call filling). routecut classifies each prompt, picks the cheapest sufficient model under a policy you can read and edit, falls back/escalates when it's unsure, and accounts the cost saved versus a premium baseline — per call.

Why

OpenRouter and LiteLLM are the transport (call many models through one API); you still pick the model. routecut is the decision: classify the prompt, choose the cheapest-sufficient model, and make the savings visible. It can even ride on top of a unified API. Local-first, BYO keys, transparent policy.

Install

pip install -e .            # core, zero required deps
pip install -e ".[extras]"  # + tiktoken (accurate token counts) + rich

Python 3.11+.

Quick start

from routecut import Router

router = Router.from_config()          # routes.toml + pricing.toml, BYO keys via env
resp = router.chat(messages=[{"role": "user", "content": "translate hello to French"}])

print(resp.text)
print(resp.routing.model)        # e.g. "qwen-turbo" — the cheap model it chose
print(resp.routing.saved_usd)    # $ saved vs the premium baseline, this call

It speaks the OpenAI chat shape and routes to OpenAI, DeepSeek, Qwen (DashScope), Moonshot, MiniMax, and any OpenAI-compatible endpoint by provider prefix — set the keys for whichever providers your routes use.

See it save money (no API key needed)

python examples/demo_savings.py

Routes a realistic mix of prompts (drafts, a tool call, hard reasoning) through a fake provider and prints which model each went to and the running savings vs always-premium. Then:

routecut savings     # total spent vs baseline, savings %, by class + provider
routecut calls       # recent routed calls with cost + $ saved

How it routes

  1. Classify the prompt: draft | tool_use | reasoning | code_explore | code_plan | code_execute, each with a confidence (transparent heuristics; no model in the hot path).
  2. Decide via routes.toml: pick the cheapest model the policy allows for that class. If confidence is below min_confidence, escalate one tier up (bias errors toward "spent a bit more", never "worse answer").
  3. Call the chosen model. On error or a cheap quality signal (empty output, bare refusal), fall back down the route's chain.
  4. Account: saved_usd = baseline_cost - actual_cost, using the same token counts and pricing table for both. Reported honestly — it can be negative if a call was escalated to/above the baseline.

Configure (data, not code)

routes.toml — the routing policy. Edit freely; no release needed:

baseline_model = "claude-opus-4.1"   # what savings are measured against

[policy]
min_confidence = 0.6                  # below this, escalate one tier up

[route.draft]
model = "qwen-turbo"
fallback = "gpt-4o-mini"

[route.reasoning]
model = "claude-sonnet"
escalate_to = "claude-opus-4.1"       # used when classifier confidence is low

pricing.toml — per-model $/1M tokens (also ROUTECUT_PRICING=/path). Unknown models fall back to a high default so savings never look better than reality.

CLI

Command What it does
routecut savings total spent vs baseline, savings %, by class + provider
routecut calls recent routed calls (class, model, cost, $ saved)
routecut policy print routes + pricing; validate every routed model is priced
routecut doctor check provider keys present + pricing coverage

Honest limits (status: MVP)

  • Classification is heuristic v1. It emits a confidence and escalates when unsure, but it can mis-route. The fallback/escalation path is the safety net; tune routes.toml and min_confidence to your workload. A tiny-model classifier for ambiguous cases is planned, not shipped.
  • "Sufficient" is not yet a measured quality score. v1 optimizes cost under a policy you control and catches gross failures (empty/refusal) for fallback. A real quality eval is a separate, later scope — we don't overclaim it.
  • Not built yet: the LangGraph ConditionalEdge preset, the coding-agent gateway preset, and the HTTP proxy for non-Python stacks. The SDK ships first. See ../pain-radar/specs/05-llm-cost-router/spec.md.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routecut-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

routecut-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file routecut-0.1.0.tar.gz.

File metadata

  • Download URL: routecut-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for routecut-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2e369599df22f095b0725d73b178251e71808bba6ab607bb5e72468c638169fc
MD5 77cc233e560fbc954564e7a79d508252
BLAKE2b-256 33ef7f7cf97b08718f7d94040010d89de3c7b7bb3a0b7562a59ac2db3dd0fc86

See more details on using hashes here.

File details

Details for the file routecut-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: routecut-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for routecut-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f96cfa86656a9beedbaf4e192488fd01d9d6280385ec26bcdf5863cda36aa6d6
MD5 6f539fa981e374cbd33a870662bc3e3e
BLAKE2b-256 8b5be9a2d2562a91a13006b7ce7e23496f12da6a06589715a402974b95de966d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page