Autonomous LLM model routing with Thompson Sampling — cut API costs 40-70 % with <1 % accuracy drop

These details have not been verified by PyPI

Project links

Repository

Project description

Bayesian Router

Autonomous LLM model routing with Thompson Sampling — cut API costs by 40-70% with <1% accuracy drop.

Features

Label-free learning — Composite reward from 3 objective signals (validity, latency, no-retry), zero human labels
Model rot adaptation — Decaying memory detects provider degradation and reroutes automatically
Cold-start solution — Expert priors from benchmarks converge in ~20 queries instead of 100
Safety guarantees — Confidence fallback plus automated circuit-breaker states
Shadow evaluation — Mirror 5% of traffic to a hidden candidate model
Framework-agnostic — Works with any LLM API (OpenAI, Anthropic, Google, local models)

Installation

pip install bayesian-router

With optional dependencies:

pip install bayesian-router[demo]    # Streamlit demo + Plotly charts
pip install bayesian-router[dev]     # pytest
pip install bayesian-router[all]     # Everything

Quick Start

from bayesian_router import Router

# Zero-config with sensible defaults
router = Router()

# Select model for next request
result = router.select()
print(f"Route to: {result.model}")

# After LLM call, update with telemetry — no human labels needed
reward = router.update(
    result.model,
    latency_ms=450,
    is_valid=True,
    retried=False,
)
print(f"Reward: {reward.total:.2f}")

# Optional: mirror a hidden candidate model on shadow traffic
if result.shadow_model:
    router.update_shadow(
        result.shadow_model,
        latency_ms=520,
        is_valid=True,
        retried=False,
    )

With Any LLM

Bayesian Router manages routing decisions — you bring your own model client:

OpenAI

from openai import OpenAI
from bayesian_router import Router

client = OpenAI()
router = Router()

result = router.select()

response = client.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": "Hello"}],
)

router.update(
    result.model,
    latency_ms=response.usage.completion_tokens * 10,  # rough proxy
    is_valid=True,
    retried=False,
)

if result.shadow_model:
    shadow = client.chat.completions.create(
        model=result.shadow_model,
        messages=[{"role": "user", "content": "Hello"}],
    )
    router.update_shadow(
        result.shadow_model,
        latency_ms=shadow.usage.completion_tokens * 10,
        is_valid=True,
        retried=False,
    )

Anthropic

from anthropic import Anthropic
from bayesian_router import Router, ModelConfig

client = Anthropic()
router = Router(models={
    "claude-sonnet": ModelConfig(alpha=8, beta=3, cost_per_1k=0.003),
    "claude-haiku":  ModelConfig(alpha=5, beta=4, cost_per_1k=0.00025),
})

result = router.select()
response = client.messages.create(model=result.model, ...)
router.update(result.model, latency_ms=..., is_valid=..., retried=...)
if result.shadow_model:
    shadow = client.messages.create(model=result.shadow_model, ...)
    router.update_shadow(
        result.shadow_model, latency_ms=..., is_valid=..., retried=...
    )

Custom Reward Weights

from bayesian_router import Router, CompositeReward

# Prioritise validity over latency for high-stakes applications
reward = CompositeReward(
    validity_weight=0.70,
    latency_weight=0.15,
    retry_weight=0.15,
    latency_midpoint_ms=3000,
)

router = Router(reward_fn=reward)

Model Rot Handling

The router uses decaying memory (exponential discounting) so recent observations weigh more than old ones. If a provider ships a regression overnight, the router adapts within minutes:

router = Router(
    gamma=0.90,          # Stronger decay (default 0.95)
    decay_interval=30,   # Apply every 30 queries (default 50)
)

Cold Start with Expert Priors

from bayesian_router import Router, EXPERT_PRIORS, UNIFORM_PRIORS

# Expert priors from public benchmarks — converge fast
router_fast = Router(models=EXPERT_PRIORS)

# Uniform priors — maximum uncertainty, needs ~100 queries
router_slow = Router(models=UNIFORM_PRIORS)

Health Monitoring

stats = router.get_stats()
print(stats["model_share"])     # {"gpt-4o": 0.25, "gpt-4o-mini": 0.15, ...}
print(stats["distributions"])   # {"gpt-4o": "α=12.3 β=4.1", ...}

for name, state in router.get_distributions().items():
    print(f"{name}: confidence={state.confidence:.2f}, selected={state.selections}")

Examples

See the examples/ folder for complete working demos:

Example	Description
`01_basic_usage.py`	Create a router, select models, update with telemetry
`02_model_rot.py`	Watch the router adapt when a model degrades
`03_cold_start.py`	Expert priors vs uniform — convergence speed comparison
`04_streamlit_demo.py`	Interactive demo with live charts (DevConf talk)

Running Examples

git clone https://github.com/shrinidhi-mahishi/bayesian-router.git
cd bayesian-router
python -m venv venv && source venv/bin/activate
pip install -e ".[all]"

python examples/01_basic_usage.py
streamlit run examples/04_streamlit_demo.py

API Reference

Router

Method	Description
`select()`	Pick a primary model and optional shadow model → `RoutingResult`
`update(model, *, latency_ms, is_valid, retried)`	Update Beta distribution → `RewardResult`
`update_shadow(model, *, latency_ms, is_valid, retried)`	Update mirrored shadow telemetry → `RewardResult`
`get_distributions()`	Current α/β/confidence for every model
`get_stats()`	Summary statistics (JSON-serialisable)

CompositeReward

Method	Description
`compute(latency_ms, is_valid, retried)`	Score a single response → `RewardResult`

ModelSimulator

Method	Description
`call(model, tokens=500)`	Simulate an LLM call → telemetry dict
`degrade(model, factor)`	Inject model rot
`reset(model)`	Remove degradation

Configuration

router = Router(
    models=EXPERT_PRIORS,       # Model priors (or custom dict)
    reward_fn=CompositeReward(),# Reward function
    gamma=0.95,                 # Memory decay factor
    decay_interval=50,          # Apply decay every N queries
    confidence_floor=0.50,      # Safety floor before serving a model
    shadow_rate=0.05,           # Fraction of mirrored shadow traffic
    fallback_model="gpt-4o",    # Trusted model for fallback
    circuit_window_size=5,      # Recent outcomes tracked per model
    circuit_failure_threshold=3,# Failures needed to open a breaker
    circuit_reset_queries=20,   # Cooldown before half-open probe
    half_open_max_requests=2,   # Successful probes to close breaker
)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.1

May 16, 2026

0.1.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_router-0.1.1.tar.gz (15.6 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bayesian_router-0.1.1-py3-none-any.whl (13.7 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file bayesian_router-0.1.1.tar.gz.

File metadata

Download URL: bayesian_router-0.1.1.tar.gz
Upload date: May 16, 2026
Size: 15.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for bayesian_router-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`bf8cf43646896ef78887c0a113c2013924436c44fdace80b2217cb7e39e3a500`
MD5	`2ff6bfb4506a8f9c304947aee6c319ca`
BLAKE2b-256	`db86dcfba2f98197369604bddc7955aab00b50e7c57dc1e158fc00222e244618`

See more details on using hashes here.

File details

Details for the file bayesian_router-0.1.1-py3-none-any.whl.

File metadata

Download URL: bayesian_router-0.1.1-py3-none-any.whl
Upload date: May 16, 2026
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for bayesian_router-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3294c30637335ae29713232092bf55b547d295bfa20a526cef1eab2dae9c3fa`
MD5	`6ff1553c8b8042570ce3b6b091c270c8`
BLAKE2b-256	`da7553913db925b230bb89e23b22af27170e29a5bc35daa30e6e71cb13cfb8d7`

See more details on using hashes here.

bayesian-router 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Bayesian Router

Features

Installation

Quick Start

With Any LLM

OpenAI

Anthropic

Custom Reward Weights

Model Rot Handling

Cold Start with Expert Priors

Health Monitoring

Examples

Running Examples

API Reference

Router

CompositeReward

ModelSimulator

Configuration

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes