A local-first, adaptive router for intelligent LLM model selection using contextual bandits

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

atabernermiller

These details have not been verified by PyPI

Project description

ParetoBandit

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

ParetoBandit is an open-source, cost-aware contextual bandit router for LLM serving. It enforces dollar-denominated per-request budgets, adapts online to price and quality shifts, and onboards new models at runtime — all with sub-millisecond routing latency on CPU.

Paper: ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving Author: Annette Taberner-Miller

Key Features

Online budget control. A primal–dual budget pacer enforces a per-request cost ceiling over an open-ended stream with closed-loop control — no offline penalty tuning required.
Non-stationarity resilience. Geometric forgetting on sufficient statistics enables rapid adaptation to price cuts, quality regressions, and distribution shifts, bootstrapped from optional offline priors.
Runtime model onboarding. A hot-swap registry lets operators add or remove models at runtime; the bandit's exploration bonus discovers each newcomer's niche from live traffic alone.
Sub-millisecond routing. The routing decision takes ~μs on CPU; end-to-end latency (including embedding) is <1% of typical LLM inference time.

Installation

The Quick Start example uses the built-in embedding pipeline, which requires PyTorch and sentence-transformers:

pip install paretobandit[embeddings]

With the interactive demo (adds matplotlib):

pip install paretobandit[demo]

Core only (for precomputed features or custom encoders):

pip install paretobandit

For development (from source):

git clone https://github.com/ParetoBandit/ParetoBandit.git
cd ParetoBandit
pip install -e ".[dev]"

Quick Start

from pareto_bandit import BanditRouter

# Create a router with default settings (cold start, safe exploration)
router = BanditRouter.create()

# Route a prompt — returns (selected_model, routing_log)
model, log = router.route("Explain the transformer architecture", max_cost=0.01)
print(f"Model: {model}, Cost: ${log.cost_usd:.6f}")

# After observing quality, feed back a reward to update the bandit
router.process_feedback(log.request_id, reward=0.85)

CLI usage:

# Route a prompt
paretobandit "Summarize this document" --max-cost 0.005

# Download embedding model for offline/Docker use
paretobandit --download-models

Feature Engineering

ParetoBandit supports three embedding paths, from turnkey to fully custom:

1. Default pipeline (requires `embeddings` extra)

The default uses all-MiniLM-L6-v2 with a shipped 25-component PCA projection trained on 80K prompts from the paper's evaluation corpus.

router = BanditRouter.create()  # loads pca_25.joblib automatically

2. Custom encoder

Bring any encoder function — no sentence-transformers dependency required. Raw embeddings are used directly (+ bias term); optionally pair with your own PCA artifact.

from pareto_bandit import BanditRouter
from pareto_bandit.feature_service import FeatureService

# Without PCA (raw embeddings)
fs = FeatureService(custom_encoder=my_encode_fn, embedding_dim=768)

# With your own PCA
fs = FeatureService(custom_encoder=my_encode_fn, embedding_dim=768, pca_path="my_pca.joblib")

router = BanditRouter.create(feature_service=fs, priors="none")

3. Precomputed feature vectors

If you already have embeddings (e.g., from an upstream service), skip encoding entirely:

import numpy as np
from pareto_bandit import BanditRouter
from pareto_bandit.feature_service import FeatureService

fs = FeatureService.for_precomputed(dimension=25)
router = BanditRouter.create(feature_service=fs, priors="none")

# Pass numpy arrays instead of strings
features = np.random.randn(25)
model, log = router.route(features, max_cost=0.01)

Training a custom PCA

When using a different SentenceTransformer model, the shipped PCA is incompatible. Generate a matching artifact with train_pca:

from pareto_bandit import train_pca

pca = train_pca(
    prompts=my_prompt_corpus,           # list[str], >=100 recommended
    encoder_model="your-model-name",
    n_components=25,
    output_path="my_pca.joblib",
)

router = BanditRouter.create(
    context_model="your-model-name",
    pca_path="my_pca.joblib",
)

API Overview

Full API documentation: API Reference

Class / Function	Purpose
`BanditRouter.create()`	Factory for a fully initialized router (default or custom models)
`BanditRouter.route()`	Route a prompt to the best model under cost/latency constraints
`BanditRouter.process_feedback()`	Feed back a reward signal (supports delayed feedback)
`BanditRouter.register_model()`	Hot-add a model at runtime
`BanditRouter.exploit()`	Context manager for greedy evaluation (no exploration)
`FeatureService`	Embedding + PCA pipeline (default, custom encoder, or precomputed)
`FeatureService.for_precomputed()`	Lightweight service for pre-embedded vectors
`BudgetPacer`	Online primal-dual budget controller (hard/soft/adaptive modes)
`RouterConfig`	Hyperparameter dataclass (reward range, cost anchors, etc.)
`train_pca()`	Train a custom PCA artifact for a non-default encoder
`generate_warmup_priors()`	Build offline warmup priors from labelled data
`SqliteContextStore`	Production context store with TTL (for delayed RLHF feedback)

Architecture

src/pareto_bandit/
├── router.py            # BanditRouter — main entry point, arm selection, update loop
├── policy.py            # DisjointLinUCB, prior calibration
├── budget_pacer.py      # Online primal–dual budget pacer (hard/soft/adaptive modes)
├── feature_service.py   # SentenceTransformer embedding + PCA compression
├── calibration.py       # train_pca(), generate_warmup_priors()
├── storage.py           # SqliteContextStore (delayed feedback), EphemeralContextStore
├── costs.py             # Cost model and heuristics
├── rewards.py           # Reward normalization and aggregation
├── config/              # Model registry, default hyperparameters, packaged artifacts
└── utils/               # Validation, warmup, synthetic data generation

Design Principles

Principle	Mechanism
Budget enforcement	Primal–dual ascent on per-request cost ceiling; no horizon assumption
Non-stationarity	Geometric forgetting on A⁻¹ and b sufficient statistics
Cold-start mitigation	Optional warm-start priors from offline data (80K RouteLLM battles)
Lock-minimal concurrency	Snapshot-swap during O(d³) matrix inversions (250× lock-time reduction)
Self-healing	Missing PCA/prior artifacts trigger JIT recovery, not crashes

Reproducing Paper Experiments

All experiments map 1:1 to figures and tables in the paper. Results are deterministic given fixed seeds.

Full Reproduction

python experiments/reproduce.py

This runs all experiments in dependency order, then regenerates LaTeX macros and publication figures.

Selective Execution

# List available experiments
python experiments/reproduce.py --list

# Run a single experiment
python experiments/reproduce.py --only 01_stationary_budget_pacing

# Regenerate LaTeX + figures only (skip expensive simulations)
python experiments/reproduce.py --skip-run

Experiment Overview

Key	Section	Description
`hparam_optimization`	Appendix	Hyperparameter sweep with Pareto knee-point selection
`cost_heuristic_validation`	Appendix	Cost heuristic validation
`01_stationary_budget_pacing`	§4.1	Stationary budget pacing across 7 budget ceilings
`02_budget_plus_drift`	§4.2	Budget pacing under cost drift (10× price cut)
`03_catastrophic_failure`	§4.3	Catastrophic quality regression detection and rerouting
`04_model_onboarding`	§4.4	Runtime model onboarding (K=3 → K=4)
`warmup_ablation`	Appendix	Warmup priors vs. cold-start ablation
`prior_mismatch`	Appendix	Prior mismatch sensitivity analysis
`judge_robustness`	Appendix	Cross-judge regret comparison
`recovery_limit`	Appendix	Recovery limit under degradation
`latency_benchmark`	Appendix	Routing and end-to-end latency microbenchmark

Each experiment directory contains:

run_*.py — simulation script producing result JSONs
generate_latex.py — reads results, emits _autogen.tex macros consumed by the paper
generate_figure.py — reads results, produces PNG/PDF figures
results/ — output artifacts (JSON, figures, autogen LaTeX)

Testing

# Full test suite
python -m pytest tests/ -v

# Skip slow tests
python -m pytest tests/ -v -m "not slow"

# With coverage
python -m pytest tests/ --cov=pareto_bandit --cov-report=term-missing

# Experiment regression tests
python -m pytest experiments/tests/ -v

Project Structure

paretobandit/
├── src/pareto_bandit/       # Core Python package
├── experiments/             # Paper experiment suite
│   ├── reproduce.py         # Master orchestrator
│   ├── 01_–_04_*/           # Main experiments (§4)
│   ├── appendix/            # Appendix experiments
│   ├── utils/               # Shared simulation and LaTeX utilities
│   └── tests/               # Experiment regression tests
├── tests/                   # Unit and integration tests (135+)
├── paper/                   # LaTeX source for the MLSys paper
├── data_collection/         # Raw reward data and PCA training scripts
├── docs/                    # API reference
├── pyproject.toml           # Build config (Hatch), dependencies, tool settings
├── CONTRIBUTING.md          # Development guide
└── CHANGELOG.md             # Version history

Requirements

Python ≥ 3.10
Core: numpy, joblib, scikit-learn, tqdm
Embeddings (optional): torch, sentence-transformers, transformers
Experiments: matplotlib, scipy, python-dotenv

Full dependency specifications are in pyproject.toml. A pinned lockfile for exact reproduction of paper results is available in requirements-lock.txt.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for development setup, coding standards, and the pull request workflow. By participating you agree to abide by the Code of Conduct.

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

atabernermiller

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Mar 29, 2026

This version

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paretobandit-0.1.0.tar.gz (659.1 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paretobandit-0.1.0-py3-none-any.whl (680.3 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file paretobandit-0.1.0.tar.gz.

File metadata

Download URL: paretobandit-0.1.0.tar.gz
Upload date: Mar 27, 2026
Size: 659.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paretobandit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`57009e389f95119d38100ab585a25ce676e11119a3fe5f9b381dc8738a9f2f1f`
MD5	`365105686a83ab795dd058b5cd77c0d1`
BLAKE2b-256	`f0ec1c6ee80b0fcdf8615fe66d4b16bc53fca04ebc5021a6b87e4cb9b2d0f909`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paretobandit-0.1.0.tar.gz:

Publisher: publish.yml on ParetoBandit/ParetoBandit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paretobandit-0.1.0.tar.gz
- Subject digest: 57009e389f95119d38100ab585a25ce676e11119a3fe5f9b381dc8738a9f2f1f
- Sigstore transparency entry: 1189536952
- Sigstore integration time: Mar 27, 2026
Source repository:
- Permalink: ParetoBandit/ParetoBandit@df1fe6cb992ad876e6deda0f6a3918e65ca6e97a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ParetoBandit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@df1fe6cb992ad876e6deda0f6a3918e65ca6e97a
- Trigger Event: release

File details

Details for the file paretobandit-0.1.0-py3-none-any.whl.

File metadata

Download URL: paretobandit-0.1.0-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 680.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paretobandit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f08599bc9cc5fe4c5b59d060ef616c60180234d7670cce56976992d0d16b3e4b`
MD5	`ea366fe03999748370972bac58bc5bb7`
BLAKE2b-256	`64f8fc0419a2b9b8540845f702356cfed724d833d5f93de4fb0941b81f142452`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paretobandit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ParetoBandit/ParetoBandit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paretobandit-0.1.0-py3-none-any.whl
- Subject digest: f08599bc9cc5fe4c5b59d060ef616c60180234d7670cce56976992d0d16b3e4b
- Sigstore transparency entry: 1189536954
- Sigstore integration time: Mar 27, 2026
Source repository:
- Permalink: ParetoBandit/ParetoBandit@df1fe6cb992ad876e6deda0f6a3918e65ca6e97a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ParetoBandit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@df1fe6cb992ad876e6deda0f6a3918e65ca6e97a
- Trigger Event: release

paretobandit 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Key Features

Installation

Quick Start

Feature Engineering

1. Default pipeline (requires embeddings extra)

2. Custom encoder

3. Precomputed feature vectors

Training a custom PCA

API Overview

Architecture

Design Principles

Reproducing Paper Experiments

Full Reproduction

Selective Execution

Experiment Overview

Testing

Project Structure

Requirements

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Default pipeline (requires `embeddings` extra)