Skip to main content

Autonomous LLM model routing with Thompson Sampling — cut API costs 40-70 % with <1 % accuracy drop

Project description

Bayesian Router

Autonomous LLM model routing with Thompson Sampling — cut API costs by 40-70% with <1% accuracy drop.

Features

  • Label-free learning — Composite reward from 3 objective signals (validity, latency, no-retry), zero human labels
  • Model rot adaptation — Decaying memory detects provider degradation and reroutes automatically
  • Cold-start solution — Expert priors from benchmarks converge in ~20 queries instead of 100
  • Safety guarantees — Confidence fallback plus automated circuit-breaker states
  • Shadow evaluation — Mirror 5% of traffic to a hidden candidate model
  • Framework-agnostic — Works with any LLM API (OpenAI, Anthropic, Google, local models)

Installation

pip install bayesian-router

With optional dependencies:

pip install bayesian-router[demo]    # Streamlit demo + Plotly charts
pip install bayesian-router[dev]     # pytest
pip install bayesian-router[all]     # Everything

Quick Start

from bayesian_router import Router

# Zero-config with sensible defaults
router = Router()

# Select model for next request
result = router.select()
print(f"Route to: {result.model}")

# After LLM call, update with telemetry — no human labels needed
reward = router.update(
    result.model,
    latency_ms=450,
    is_valid=True,
    retried=False,
)
print(f"Reward: {reward.total:.2f}")

# Optional: mirror a hidden candidate model on shadow traffic
if result.shadow_model:
    router.update_shadow(
        result.shadow_model,
        latency_ms=520,
        is_valid=True,
        retried=False,
    )

With Any LLM

Bayesian Router manages routing decisions — you bring your own model client:

OpenAI

from openai import OpenAI
from bayesian_router import Router

client = OpenAI()
router = Router()

result = router.select()

response = client.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": "Hello"}],
)

router.update(
    result.model,
    latency_ms=response.usage.completion_tokens * 10,  # rough proxy
    is_valid=True,
    retried=False,
)

if result.shadow_model:
    shadow = client.chat.completions.create(
        model=result.shadow_model,
        messages=[{"role": "user", "content": "Hello"}],
    )
    router.update_shadow(
        result.shadow_model,
        latency_ms=shadow.usage.completion_tokens * 10,
        is_valid=True,
        retried=False,
    )

Anthropic

from anthropic import Anthropic
from bayesian_router import Router, ModelConfig

client = Anthropic()
router = Router(models={
    "claude-sonnet": ModelConfig(alpha=8, beta=3, cost_per_1k=0.003),
    "claude-haiku":  ModelConfig(alpha=5, beta=4, cost_per_1k=0.00025),
})

result = router.select()
response = client.messages.create(model=result.model, ...)
router.update(result.model, latency_ms=..., is_valid=..., retried=...)
if result.shadow_model:
    shadow = client.messages.create(model=result.shadow_model, ...)
    router.update_shadow(
        result.shadow_model, latency_ms=..., is_valid=..., retried=...
    )

Custom Reward Weights

from bayesian_router import Router, CompositeReward

# Prioritise validity over latency for high-stakes applications
reward = CompositeReward(
    validity_weight=0.70,
    latency_weight=0.15,
    retry_weight=0.15,
    latency_midpoint_ms=3000,
)

router = Router(reward_fn=reward)

Model Rot Handling

The router uses decaying memory (exponential discounting) so recent observations weigh more than old ones. If a provider ships a regression overnight, the router adapts within minutes:

router = Router(
    gamma=0.90,          # Stronger decay (default 0.95)
    decay_interval=30,   # Apply every 30 queries (default 50)
)

Cold Start with Expert Priors

from bayesian_router import Router, EXPERT_PRIORS, UNIFORM_PRIORS

# Expert priors from public benchmarks — converge fast
router_fast = Router(models=EXPERT_PRIORS)

# Uniform priors — maximum uncertainty, needs ~100 queries
router_slow = Router(models=UNIFORM_PRIORS)

Health Monitoring

stats = router.get_stats()
print(stats["model_share"])     # {"gpt-4o": 0.25, "gpt-4o-mini": 0.15, ...}
print(stats["distributions"])   # {"gpt-4o": "α=12.3 β=4.1", ...}

for name, state in router.get_distributions().items():
    print(f"{name}: confidence={state.confidence:.2f}, selected={state.selections}")

Examples

See the examples/ folder for complete working demos:

Example Description
01_basic_usage.py Create a router, select models, update with telemetry
02_model_rot.py Watch the router adapt when a model degrades
03_cold_start.py Expert priors vs uniform — convergence speed comparison
04_streamlit_demo.py Interactive demo with live charts (DevConf talk)

Running Examples

git clone https://github.com/shrinidhi-mahishi/bayesian-router.git
cd bayesian-router
python -m venv venv && source venv/bin/activate
pip install -e ".[all]"

python examples/01_basic_usage.py
streamlit run examples/04_streamlit_demo.py

API Reference

Router

Method Description
select() Pick a primary model and optional shadow model → RoutingResult
update(model, *, latency_ms, is_valid, retried) Update Beta distribution → RewardResult
update_shadow(model, *, latency_ms, is_valid, retried) Update mirrored shadow telemetry → RewardResult
get_distributions() Current α/β/confidence for every model
get_stats() Summary statistics (JSON-serialisable)

CompositeReward

Method Description
compute(latency_ms, is_valid, retried) Score a single response → RewardResult

ModelSimulator

Method Description
call(model, tokens=500) Simulate an LLM call → telemetry dict
degrade(model, factor) Inject model rot
reset(model) Remove degradation

Configuration

router = Router(
    models=EXPERT_PRIORS,       # Model priors (or custom dict)
    reward_fn=CompositeReward(),# Reward function
    gamma=0.95,                 # Memory decay factor
    decay_interval=50,          # Apply decay every N queries
    confidence_floor=0.50,      # Safety floor before serving a model
    shadow_rate=0.05,           # Fraction of mirrored shadow traffic
    fallback_model="gpt-4o",    # Trusted model for fallback
    circuit_window_size=5,      # Recent outcomes tracked per model
    circuit_failure_threshold=3,# Failures needed to open a breaker
    circuit_reset_queries=20,   # Cooldown before half-open probe
    half_open_max_requests=2,   # Successful probes to close breaker
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_router-0.1.1.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bayesian_router-0.1.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file bayesian_router-0.1.1.tar.gz.

File metadata

  • Download URL: bayesian_router-0.1.1.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for bayesian_router-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bf8cf43646896ef78887c0a113c2013924436c44fdace80b2217cb7e39e3a500
MD5 2ff6bfb4506a8f9c304947aee6c319ca
BLAKE2b-256 db86dcfba2f98197369604bddc7955aab00b50e7c57dc1e158fc00222e244618

See more details on using hashes here.

File details

Details for the file bayesian_router-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bayesian_router-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b3294c30637335ae29713232092bf55b547d295bfa20a526cef1eab2dae9c3fa
MD5 6ff1553c8b8042570ce3b6b091c270c8
BLAKE2b-256 da7553913db925b230bb89e23b22af27170e29a5bc35daa30e6e71cb13cfb8d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page