Skip to main content

Prompt-aware LLM routing-decision service: predicts which model can complete a prompt and picks the cheapest one.

Project description

xorbits

xrouter-llm

xrouter-llm is a prompt-aware LLM routing-decision service. It answers "which model should serve this prompt?" and records the choice — it does NOT call the underlying LLMs.

Invariant

Do not train:  prompt -> selected model
Train:         prompt + model -> probability the model completes the prompt
Decide:        predicted completion + cost -> cheapest model that can complete

Completion is factored into two decoupled axes (an IRT-style model):

P(complete) = sigmoid(a * capability(model) + b * difficulty(prompt) + c)
  • capability(model) = the mean of the model's published gpqa_diamond and livecodebench (both full-coverage on the training side). Going wider doesn't help at this data scale — a flat mean dilutes and learned weights overfit at 37 profiled models; see AGENTS.md "Capability benchmarks". Used directly, so a brand-new model's benchmarks drive its ranking.
  • difficulty(prompt) = a Ridge regressor on a multilingual embedding (Qwen/Qwen3-Embedding-0.6B), trained on each prompt's empirical pass-rate. Multilingual (Chinese transfers from English training data). Picked over bge-m3 by a controlled probe (scripts/probe_qwen_difficulty.py): higher held-out Pearson and it no longer rates trivial prompts ("1+1=?") as maximally hard.

This factoring is the key lesson: a single joint classifier could not rank unseen models by their benchmarks (on this data, model capability barely explains completion marginally — but it does once difficulty is controlled, which is exactly what the factored model exploits).

Components

  • IRTRouter (irt_router.py): the predictor (difficulty x capability).
  • RoutingPolicy (policy.py): "cheapest model whose predicted completion clears completion_threshold; else the highest predicted completion".
  • serving.py / server.py: HTTP routing-decision API + single-page web UI.
  • resources/config/models/: a per-model YAML registry of capability profiles (bundled in the package; resolve with default_models_dir()).
  • resources/config/routers/: named "auto configs" — a candidate model set + policy (bundled; default_routers_dir()).
  • resources/models/irt_router_350k.joblib: the trained router shipped with the package (default_model_path()).

Install

pip install xrouter-llm        # ships a trained router + model registry
# or, for development:
pip install -e ".[dev]"

The wheel bundles a trained router artifact, the model-profile registry, and the router configs, so a fresh install can serve immediately with no extra files.

Datasets

The production difficulty model is trained on multiple datasets combined (all feed the difficulty axis; only profiled models feed the capability axis):

Source Type Scale In production train?
NPULH/LLMRouterBench (350k stream sample) single-turn QA / code / math (22 tasks) 37 models x ~13.8k prompts
agent-psychometrics — Terminal-Bench 2.0 terminal agent 89 tasks x 112 subjects --dataset agentic:agentic/terminalbench
agent-psychometrics — SWE-bench Verified coding agent 500 tasks x 134 subjects ✅ task text joined from princeton-nlp/SWE-bench_Verified
agent-psychometrics — SWE-bench Pro / GSO coding agent 730x14 / 102x15 ⛔ ship no local task text, external join needed

The current artifact trains on LLMRouterBench 350k + Terminal-Bench + SWE-bench Verified (377,997 rows / ~14,364 prompts / 283 subjects). The agentic matrices come from agent-psychometrics (MIT) via agentic.py. Only the 37 profiled llmrouterbench models feed the capability axis; agentic subjects feed difficulty only. RouterBench (withmartian/routerbench) remains a smaller legacy baseline. Local datasets and trained artifacts are not committed (data/, artifacts/ are gitignored).

Adding more agentic prompt types (e.g. your own traffic) is the only way to make difficulty accurate for task mixes outside coding/terminal — see AGENTS.md.

Train

xrouter-llm train-irt \
  --dataset llmrouterbench:data/raw/llmrouterbench_stream_sample_350k \
  --dataset agentic:agentic/terminalbench \
  --dataset agentic:agentic/swebench_verified \
  --benchmark-profiles artifacts/profiles/llmrouterbench_350k_profiles_priority_collected.json \
  --output artifacts/models/irt_router_350k.joblib

Diagnostics: sweep-thresholds (cost/completion frontier + calibration) and eval-model-holdout (leave-one-model-out generalization).

Serve

The bundled router, registry, and configs are the defaults, so a bare invocation works out of the box:

xrouter-llm serve --port 8080

Override any of them to use your own trained model or registry:

xrouter-llm serve \
  --model artifacts/models/irt_router_350k.joblib \
  --models-dir config/models --routers-dir config/routers \
  --db artifacts/calls.db --port 8080
  • GET / — single-page UI (prompt box, config picker, decision table, history)
  • GET /api/configs, POST /api/route ({prompt, config, task?}), GET /api/history?limit=N
  • Every decision is logged to SQLite (*.db/*.sqlite are gitignored — the log holds user prompts).

Model registry

One YAML per supported model, bundled under src/xrouter_llm/resources/config/models/ (capability profile: provider, costs, context, published benchmarks as 0-100 percentages). model_id is the model's canonical OpenRouter slug (e.g. anthropic/claude-opus-4.8). The bundled registry is the default for --benchmark-profiles; point it at your own directory or file to extend it. Add a model = add a file.

from xrouter_llm import IRTRouter, default_model_path, default_models_dir, load_benchmark_profiles

router = IRTRouter.load(default_model_path())
for profile in load_benchmark_profiles(default_models_dir()).profiles():
    router.add_benchmark_profile(profile)

preds = router.predict("实现一个分布式一致性算法", model_ids=["claude-opus-4-8", "deepseek-v4-pro"])
print({p.model_id: round(p.mu, 3) for p in preds})

License

xrouter-llm is released under the Xagent Source License (© Xorbits Inc.) — see LICENSE. It is source-available, not an OSI-approved open source license.

The license text is shared verbatim with Xagent; for this project the licensed "Software" is xrouter-llm, and the "Restricted Functionality" / hosted-service and competitive-use clauses apply to its routing-decision and model-selection capabilities. In short: use, modification, and internal/single-tenant deployment are permitted; offering it as a multi-tenant hosted/managed service, or a directly competing service, is not. See LICENSE for the controlling terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xrouter_llm-0.1.0.tar.gz (109.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xrouter_llm-0.1.0-py3-none-any.whl (112.2 kB view details)

Uploaded Python 3

File details

Details for the file xrouter_llm-0.1.0.tar.gz.

File metadata

  • Download URL: xrouter_llm-0.1.0.tar.gz
  • Upload date:
  • Size: 109.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xrouter_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8ab6943d50634d9d119933892ef013a376d05da5fd59853f51590f41d41b1653
MD5 7722fb460a3fab755134d1deea7daf6c
BLAKE2b-256 f560a078a3880fe4eda3b875c08b80e979ce5b1444491df7461b4b78110c46c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for xrouter_llm-0.1.0.tar.gz:

Publisher: release.yml on xorbitsai/xrouter-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xrouter_llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xrouter_llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 112.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xrouter_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2aa292289d037e1e042f2565657664d57ba0b970d89b6417956e1f407c352ee1
MD5 d83a0bf4ac5dce4bae2e372b07104c8b
BLAKE2b-256 ec08d4c11fd2bfa420377154f247e1f14a08b1f42732fabe805ae733c12a2503

See more details on using hashes here.

Provenance

The following attestation bundles were made for xrouter_llm-0.1.0-py3-none-any.whl:

Publisher: release.yml on xorbitsai/xrouter-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page