Prompt-aware LLM routing-decision service: predicts which model can complete a prompt and picks the cheapest one.
Project description
Stop sending every prompt to your most expensive LLM.
xrouter-llm is a prompt-aware LLM routing-decision service: it predicts
which models can complete a prompt, then chooses the cheapest model that clears
the bar. On our tested dataset, it cuts realized cost by 52.4% while
improving completion by +1.7 pts.
It answers "which model should serve this prompt?" and records the choice — it does NOT call the underlying LLMs.
Install
pip install xrouter-llm # ships a trained router + model registry
# or, for development:
pip install -e ".[dev]"
The wheel bundles a trained router artifact, the model-profile registry, and the router configs, so a fresh install can serve immediately with no extra files.
Serve
The bundled router, registry, and configs are the defaults, so a bare invocation works out of the box:
xrouter-llm serve --port 8080
Override any of them to use your own trained model or registry:
xrouter-llm serve \
--model artifacts/models/irt_router_350k.joblib \
--models-dir path/to/models --routers-dir path/to/routers \
--db artifacts/calls.db --port 8080
GET /— single-page UI (prompt box, config picker, decision table, history)GET /api/configs,POST /api/route({prompt, config, task?}),GET /api/history?limit=N- Every decision is logged to SQLite (
*.db/*.sqliteare gitignored — the log holds user prompts).
Model registry
One YAML per supported model, bundled under
src/xrouter_llm/resources/config/models/ (capability profile: provider, costs,
context, published benchmarks as 0-100 percentages). model_id is the model's
canonical OpenRouter slug (e.g. anthropic/claude-opus-4.8). The bundled
registry is the default for --benchmark-profiles; point it at your own
directory or file to extend it. Add a model = add a file.
from xrouter_llm import IRTRouter, default_model_path, default_models_dir, load_benchmark_profiles
router = IRTRouter.load(default_model_path())
for profile in load_benchmark_profiles(default_models_dir()).profiles():
router.add_benchmark_profile(profile)
preds = router.predict(
"Design a distributed consensus algorithm",
model_ids=["anthropic/claude-opus-4.8", "deepseek/deepseek-v4-pro"],
)
print({p.model_id: round(p.mu, 3) for p in preds})
How it works
Do not train: prompt -> selected model
Train: prompt + model -> probability the model completes the prompt
Decide: predicted completion + cost -> cheapest model that can complete
Completion is factored into two decoupled axes (an IRT-style model):
P(complete) = sigmoid(a * capability(model) + b * difficulty(prompt) + c)
- capability(model) = the mean of the model's published
gpqa_diamondandlivecodebench(both full-coverage on the training side). Going wider doesn't help at this data scale — a flat mean dilutes and learned weights overfit at 37 profiled models; see AGENTS.md "Capability benchmarks". Used directly, so a brand-new model's benchmarks drive its ranking. - difficulty(prompt) = a Ridge regressor on a multilingual embedding
(
Qwen/Qwen3-Embedding-0.6B), trained on each prompt's empirical pass-rate. Multilingual (Chinese transfers from English training data). Picked overbge-m3by a controlled probe (scripts/probe_qwen_difficulty.py): higher held-out Pearson and it no longer rates trivial prompts ("1+1=?") as maximally hard.
This factoring is the key lesson: a single joint classifier could not rank unseen models by their benchmarks (on this data, model capability barely explains completion marginally — but it does once difficulty is controlled, which is exactly what the factored model exploits).
Datasets
The production difficulty model is trained on multiple datasets combined (all feed the difficulty axis; only profiled models feed the capability axis):
| Source | Type | Scale | In production train? |
|---|---|---|---|
NPULH/LLMRouterBench (350k stream sample) |
single-turn QA / code / math (22 tasks) | 37 models x ~13.8k prompts | ✅ |
| agent-psychometrics — Terminal-Bench 2.0 | terminal agent | 89 tasks x 112 subjects | ✅ --dataset agentic:agentic/terminalbench |
| agent-psychometrics — SWE-bench Verified | coding agent | 500 tasks x 134 subjects | ✅ task text joined from princeton-nlp/SWE-bench_Verified |
| agent-psychometrics — SWE-bench Pro / GSO | coding agent | 730x14 / 102x15 | ⛔ ship no local task text, external join needed |
The current artifact trains on LLMRouterBench 350k + Terminal-Bench +
SWE-bench Verified (377,997 rows / ~14,364 prompts / 283 subjects). The
agentic matrices come from
agent-psychometrics
(MIT) via agentic.py. Only the 37 profiled llmrouterbench models feed the
capability axis; agentic subjects feed difficulty only. RouterBench
(withmartian/routerbench) remains a smaller legacy baseline. Local datasets and
trained artifacts are not committed (data/, artifacts/ are gitignored).
Adding more agentic prompt types (e.g. your own traffic) is the only way to make difficulty accurate for task mixes outside coding/terminal — see AGENTS.md.
Train
xrouter-llm train-irt \
--dataset llmrouterbench:data/raw/llmrouterbench_stream_sample_350k \
--dataset agentic:agentic/terminalbench \
--dataset agentic:agentic/swebench_verified \
--benchmark-profiles artifacts/profiles/llmrouterbench_350k_profiles_priority_collected.json \
--output artifacts/models/irt_router_350k.joblib
Diagnostics: sweep-thresholds (cost/completion frontier + calibration) and
eval-model-holdout (leave-one-model-out generalization).
Components
IRTRouter(irt_router.py): the predictor (difficulty x capability).RoutingPolicy(policy.py): "cheapest model whose predicted completion clearscompletion_threshold; else the cheapest withinfallback_quality_marginof the best predicted completion".serving.py/server.py: HTTP routing-decision API + single-page web UI.resources/config/models/: a per-model YAML registry of capability profiles (bundled in the package; resolve withdefault_models_dir()).resources/config/routers/: named "auto configs" — a candidate model set + policy (bundled;default_routers_dir()).resources/models/irt_router_350k.joblib: the trained router shipped with the package (default_model_path()).
License
xrouter-llm is released under the Xagent Source License (© Xorbits Inc.) —
see LICENSE. It is source-available, not an OSI-approved open
source license.
The license text is shared verbatim with Xagent;
for this project the licensed "Software" is xrouter-llm, and the
"Restricted Functionality" / hosted-service and competitive-use clauses apply to
its routing-decision and model-selection capabilities. In short: use,
modification, and internal/single-tenant deployment are permitted; offering it as
a multi-tenant hosted/managed service, or a directly competing service, is not.
See LICENSE for the controlling terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xrouter_llm-0.1.1.tar.gz.
File metadata
- Download URL: xrouter_llm-0.1.1.tar.gz
- Upload date:
- Size: 110.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63c769b8911cb159e47e38519c84397d4f96906a72180eaef48cd4bd6168a102
|
|
| MD5 |
0791e2dfb4d7ba17186e65b4595aa04c
|
|
| BLAKE2b-256 |
fd7becafb589320a4b4183bb7394e67c894b2ab809126b70e45ae3cfe2ea687f
|
Provenance
The following attestation bundles were made for xrouter_llm-0.1.1.tar.gz:
Publisher:
release.yml on xorbitsai/xrouter-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xrouter_llm-0.1.1.tar.gz -
Subject digest:
63c769b8911cb159e47e38519c84397d4f96906a72180eaef48cd4bd6168a102 - Sigstore transparency entry: 1885884677
- Sigstore integration time:
-
Permalink:
xorbitsai/xrouter-llm@d74f9444ad93bd01345d3148c78245ec7a91a4fe -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/xorbitsai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d74f9444ad93bd01345d3148c78245ec7a91a4fe -
Trigger Event:
release
-
Statement type:
File details
Details for the file xrouter_llm-0.1.1-py3-none-any.whl.
File metadata
- Download URL: xrouter_llm-0.1.1-py3-none-any.whl
- Upload date:
- Size: 112.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d549ce76bb593aa01baab865347904277e073ea719b28f31fd900a4b0015dc35
|
|
| MD5 |
aa844565b46348fb010a41ab1b514181
|
|
| BLAKE2b-256 |
ed75094eb6c4633781ec12bf39ccb2364b9ee452c36a1d03068ac462ad30f492
|
Provenance
The following attestation bundles were made for xrouter_llm-0.1.1-py3-none-any.whl:
Publisher:
release.yml on xorbitsai/xrouter-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xrouter_llm-0.1.1-py3-none-any.whl -
Subject digest:
d549ce76bb593aa01baab865347904277e073ea719b28f31fd900a4b0015dc35 - Sigstore transparency entry: 1885884810
- Sigstore integration time:
-
Permalink:
xorbitsai/xrouter-llm@d74f9444ad93bd01345d3148c78245ec7a91a4fe -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/xorbitsai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d74f9444ad93bd01345d3148c78245ec7a91a4fe -
Trigger Event:
release
-
Statement type: