Offline benchmark and shadow-mode control plane for LLM routing policies.
Project description
ModelSwitchboard
You're not paying for models. You're paying for bad decisions.
Every company says they're "multi-model."
What they usually mean is:
- one expensive default model
- two cheaper backups nobody trusts
- routing logic hidden in application code
- no cost discipline
- no measurable policy
- no way to know what should have happened instead
So spend climbs. Latency drifts. Quality becomes anecdotal. And every meeting ends with the same sentence:
"We should probably test a smarter router."
Probably?
You're already late.
ModelSwitchboard is what disciplined teams install before routing becomes expensive chaos.
It gives you a way to benchmark, train, package, replay, and roll out routing policies with evidence.
Not guesses. Not taste. Not whoever spoke last in the meeting.
You decide routing policy the way serious operators decide anything:
- measured offline
- validated in shadow mode
- deployed gradually
- reversible instantly
- auditable always
That is how grown systems run.
The real problem nobody says out loud
Most LLM routing stacks fail for one reason:
They optimize model selection after money is already being burned.
They experiment live. They compare dashboards no one trusts. They chase prompt tweaks while architecture waste remains untouched.
That's backwards.
Before you optimize prompts… Before you fine-tune… Before you add another provider…
You should know:
- Which requests deserve premium models
- Which requests are over-served
- Where latency is wasted
- Where escalation pays for itself
- Where quality gains are fiction
That's what ModelSwitchboard exists to answer.
What it does
Benchmark routing strategies before production traffic pays tuition
Run deterministic evaluations across policies such as:
- cheapest acceptable route
- quality-max route
- learned router
- escalation chain
- guarded frontier route
See cost, quality, latency, and failure tradeoffs side by side.
Not in theory. In numbers.
Turn routing logic into a governed artifact
Policies ship as explicit assets:
policy.jsonfeature_schema.json- model weights
Versioned. Portable. Reviewable.
No mystery branch. No "small hotfix." No folklore.
Replay production patterns safely
Use shadow mode to compare what your candidate policy would have done against your current baseline.
No blast radius. No customer-facing roulette. No blind launches dressed up as innovation.
Every decision is a structured record. Auditable. Reviewable. Defensible.
Roll out like an operator, not a gambler
You get controls that matter:
- kill switch
- canary percentage
- tenant pinning
- health state gating
- compare records against baseline
If something breaks, you don't write a postmortem first. You turn the knob back.
See economics clearly
A smarter route that costs more than it returns is not smart.
Track:
- per-request cost
- route mix drift
- premium model leakage
- escalation efficiency
- quality gained per dollar spent
That's where executive trust comes from.
Who this is for
Platform leaders. Tired of paying flagship-model prices for average requests.
AI product teams. Need better outcomes without detonating margins.
Infrastructure teams. Need safe rollout mechanics, not notebook demos.
Builders with standards. Who know "send everything to the best model" is laziness wearing confidence.
What changes after adoption
Before:
- routing debates
- invisible waste
- reactive spend control
- fragile launches
- no accountability
After:
- measurable policy
- controlled rollout
- explainable decisions
- budget discipline
- faster iteration
That delta compounds.
What it does not pretend to be
No fake promises.
This is not:
- magic autonomous intelligence
- finished enterprise infrastructure
- your IAM layer
- your secrets manager
- your observability platform
- a substitute for engineering judgment
It is something rarer:
a sober, usable control plane for routing decisions that actually matter.
The uncomfortable truth
If you're spending serious money on LLM inference and still routing by instinct, defaults, or patched business logic—
you do not have an AI platform.
You have a billing relationship.
The close
The companies that win this wave won't be the ones with access to the best models.
They'll be the ones who know when not to use them.
ModelSwitchboard gives you that discipline.
Get started
pip install -e ".[dev]"
python -m pytest -m "not live" -q
# Run the offline benchmark.
python -m model_switchboard.main --split test --output-dir data/results/local_test
# Stand up the service.
pip install -e ".[service]"
uvicorn model_switchboard.service.app:build_app --factory --host 0.0.0.0 --port 8080
Python 3.11+. Apache-2.0 licensed. See LICENSE and CONTRIBUTING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_switchboard-0.3.0.tar.gz.
File metadata
- Download URL: model_switchboard-0.3.0.tar.gz
- Upload date:
- Size: 114.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d6ff22174ac912bbdadced041f1e2fd4b1f122ea5a9c2430e4bf07014a7996f
|
|
| MD5 |
76ee41a7c1b7225ff228bef655a76d54
|
|
| BLAKE2b-256 |
a4314afd215bc5150b0cc6ff6765f2bbd8ae4573c30bdc4671daf1958e8f2676
|
File details
Details for the file model_switchboard-0.3.0-py3-none-any.whl.
File metadata
- Download URL: model_switchboard-0.3.0-py3-none-any.whl
- Upload date:
- Size: 95.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
065fe0e2e68c2b312a08f1b5d9b75282b886dac5fb7a7a57cfdf940febd20e2d
|
|
| MD5 |
6a6f87ca202e83e7ce044cc56fc92082
|
|
| BLAKE2b-256 |
c80a267bb0a525596a41efdf5f125ba4747499deca01e3d7ad491ec1adfa7432
|