Online weight optimization via Thompson Sampling — learns optimal configurations from outcome feedback.
Project description
dial
Online weight optimization via Thompson Sampling. Learns optimal configurations from outcome feedback — no grid search, no manual tuning. Converges in ~50 observations. +41% NDCG@5 over fixed-weight baselines in controlled experiments.
pip install kusp-dial
Quick start
from thompson_bandits import ThompsonBandit, InMemoryStore
store = InMemoryStore(arm_ids=["relevance_heavy", "balanced", "recency_heavy"])
bandit = ThompsonBandit(store)
# Run the loop: select → observe → update
for query in queries:
arm = bandit.select()
reward = run_query(query, strategy=arm)
bandit.update(arm, reward=reward)
print(bandit.get_summary())
After 50 iterations:
BanditSummary(
best_arm='relevance_heavy',
total_pulls=50,
arms=[
ArmSummary(arm_id='balanced', mean=0.5765, pulls=11),
ArmSummary(arm_id='recency_heavy', mean=0.4210, pulls=8),
ArmSummary(arm_id='relevance_heavy', mean=0.8903, pulls=31),
]
)
The bandit explores all three options early, then converges — 31 of 50 pulls on the winner, without you telling it which arm is best.
Why Dial?
vs. grid search / random search — Those require running every combination upfront. Dial learns online, one observation at a time. No batch experiments needed.
vs. manual tuning — Manual weights are a guess that stays frozen. Dial adapts when the best option shifts — user behavior drifts, data distributions change, what worked in January fails in March.
vs. contextual bandits (LinUCB, neural) — Those need feature engineering and thousands of observations. Dial works with 50 observations and zero features. Start with Dial; graduate to contextual bandits when you have the data to justify them.
vs. Bayesian optimization (Optuna, Ax) — Those optimize over continuous parameter spaces. Dial optimizes over discrete options (strategies, presets, model choices). Different problem shape.
Use cases
- Retrieval weight tuning — learn the optimal blend of relevance, recency, and importance for RAG systems
- Model routing — discover which LLM performs best for different query types
- Prompt selection — A/B test prompt variants with automatic convergence
- Feature flag rollout — promote variants based on measured outcomes
- Any multi-option decision where you can observe a reward signal
Features
- Beta posteriors — each arm maintains a
Beta(alpha, beta)distribution updated with observed rewards - Discounted Thompson Sampling — optional decay factor for non-stationary environments where the best arm shifts over time
- Cost-aware rewards — built-in
cost_aware_reward()scales outcomes by resource efficiency - Pluggable storage —
InMemoryStorefor testing,SQLiteStorefor persistence, or implement theArmStoreprotocol for anything else - Zero SQLite dependency in core — bandit logic talks only to the
ArmStoreprotocol - Type-safe — full annotations,
runtime_checkableProtocol
Storage backends
In-memory (ephemeral)
from thompson_bandits import InMemoryStore
store = InMemoryStore(arm_ids=["a", "b", "c"], prior_alpha=1.0, prior_beta=1.0)
SQLite (persistent)
from thompson_bandits import SQLiteStore
# From a file path (store owns the connection)
store = SQLiteStore.from_path("bandits.db", arm_ids=["a", "b", "c"])
# From an existing connection (you own the connection)
import sqlite3
conn = sqlite3.connect("bandits.db")
store = SQLiteStore(conn, arm_ids=["a", "b", "c"])
Custom storage
Implement the ArmStore protocol — any class with the right methods works, no inheritance required:
from thompson_bandits import ArmStore, ArmStats
class RedisStore:
def get_stats(self, arm_id: str) -> ArmStats | None: ...
def update_stats(self, arm_id: str, alpha_delta: float, beta_delta: float, reward: float) -> None: ...
def get_all_arms(self) -> list[ArmStats]: ...
def decay(self, arm_id: str, factor: float) -> None: ...
Non-stationary environments
When the best option changes over time, enable discounting:
from thompson_bandits import ThompsonBandit, InMemoryStore, BanditConfig
config = BanditConfig(discount=0.95) # decay factor in (0, 1)
bandit = ThompsonBandit(store, config=config)
Before each update, existing evidence is decayed by the discount factor. Recent observations carry more weight than old ones.
Cost-aware optimization
When options have different costs (tokens, latency, dollars), scale rewards accordingly:
from thompson_bandits import cost_aware_reward
raw_reward = 0.9
token_cost = 1500
baseline_cost = 1000
adjusted = cost_aware_reward(raw_reward, cost=token_cost, baseline_cost=baseline_cost)
bandit.update(arm, reward=adjusted)
Inspecting state
summary = bandit.get_summary()
print(summary.best_arm) # 'relevance_heavy'
print(summary.total_pulls) # 50
for arm in summary.arms:
print(f"{arm.arm_id}: mean={arm.mean:.3f}, pulls={arm.pulls}")
# balanced: mean=0.577, pulls=11
# recency_heavy: mean=0.421, pulls=8
# relevance_heavy: mean=0.890, pulls=31
Research
Dial extracts the Thompson Sampling engine from a published research experiment on gradient-free retrieval weight learning. The experiment ran 1,200 episodes across 4 conditions on a $50/month API budget.
Citation (BibTeX)
@article{dirocco2026gradient,
title = {Gradient-Free Retrieval Weight Learning via Thompson Sampling
with LLM Self-Assessment},
author = {DiRocco, Alfonso},
year = {2026},
url = {https://github.com/kusp-dev/retrieval-weight-experiment},
note = {1,200 episodes, 4 conditions, +41\% NDCG@5 over fixed baselines}
}
Development
git clone https://github.com/fonz-ai/dial.git
cd dial
uv sync --extra dev
uv run pytest tests/ -v
uv run ruff check src/ tests/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kusp_dial-0.1.0.tar.gz.
File metadata
- Download URL: kusp_dial-0.1.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9bfbc6cfc2ff5992f5afd5e9cd30ec701d8ede1f7e238bd2c15ac6b75f3c624
|
|
| MD5 |
403ec21d5f9ed3181832adb91dd3b2b3
|
|
| BLAKE2b-256 |
386b42aa7024d82e906b2d4e1bcabd4be7705f905be143939f41fee40b4038fd
|
File details
Details for the file kusp_dial-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kusp_dial-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32cb7e4bd3ebd9a3831380d51851b449c8fa66d988b33e6cd01393741319f875
|
|
| MD5 |
f5a486943144622e19bc0c41f9e12b29
|
|
| BLAKE2b-256 |
4ac1b7af0d2c63552bd1f9ff6b6e97c144824ec5b230afc9233a4d0a50b36fc7
|