Online weight optimization via Thompson Sampling — learns optimal configurations from outcome feedback.

These details have not been verified by PyPI

Project links

Project description

dial

Online weight optimization via Thompson Sampling. Learns optimal configurations from outcome feedback — no grid search, no manual tuning. Converges in ~50 observations. +41% NDCG@5 over fixed-weight baselines in controlled experiments.

pip install kusp-dial

Quick start

from thompson_bandits import ThompsonBandit, InMemoryStore

store = InMemoryStore(arm_ids=["relevance_heavy", "balanced", "recency_heavy"])
bandit = ThompsonBandit(store)

# Run the loop: select → observe → update
for query in queries:
    arm = bandit.select()
    reward = run_query(query, strategy=arm)
    bandit.update(arm, reward=reward)

print(bandit.get_summary())

After 50 iterations:

BanditSummary(
  best_arm='relevance_heavy',
  total_pulls=50,
  arms=[
    ArmSummary(arm_id='balanced',        mean=0.5765, pulls=11),
    ArmSummary(arm_id='recency_heavy',   mean=0.4210, pulls=8),
    ArmSummary(arm_id='relevance_heavy', mean=0.8903, pulls=31),
  ]
)

The bandit explores all three options early, then converges — 31 of 50 pulls on the winner, without you telling it which arm is best.

Why Dial?

vs. grid search / random search — Those require running every combination upfront. Dial learns online, one observation at a time. No batch experiments needed.

vs. manual tuning — Manual weights are a guess that stays frozen. Dial adapts when the best option shifts — user behavior drifts, data distributions change, what worked in January fails in March.

vs. contextual bandits (LinUCB, neural) — Those need feature engineering and thousands of observations. Dial works with 50 observations and zero features. Start with Dial; graduate to contextual bandits when you have the data to justify them.

vs. Bayesian optimization (Optuna, Ax) — Those optimize over continuous parameter spaces. Dial optimizes over discrete options (strategies, presets, model choices). Different problem shape.

Use cases

Retrieval weight tuning — learn the optimal blend of relevance, recency, and importance for RAG systems
Model routing — discover which LLM performs best for different query types
Prompt selection — A/B test prompt variants with automatic convergence
Feature flag rollout — promote variants based on measured outcomes
Any multi-option decision where you can observe a reward signal

Features

Beta posteriors — each arm maintains a Beta(alpha, beta) distribution updated with observed rewards
Discounted Thompson Sampling — optional decay factor for non-stationary environments where the best arm shifts over time
Cost-aware rewards — built-in cost_aware_reward() scales outcomes by resource efficiency
Pluggable storage — InMemoryStore for testing, SQLiteStore for persistence, or implement the ArmStore protocol for anything else
Zero SQLite dependency in core — bandit logic talks only to the ArmStore protocol
Type-safe — full annotations, runtime_checkable Protocol

Storage backends

In-memory (ephemeral)

from thompson_bandits import InMemoryStore

store = InMemoryStore(arm_ids=["a", "b", "c"], prior_alpha=1.0, prior_beta=1.0)

SQLite (persistent)

from thompson_bandits import SQLiteStore

# From a file path (store owns the connection)
store = SQLiteStore.from_path("bandits.db", arm_ids=["a", "b", "c"])

# From an existing connection (you own the connection)
import sqlite3
conn = sqlite3.connect("bandits.db")
store = SQLiteStore(conn, arm_ids=["a", "b", "c"])

Custom storage

Implement the ArmStore protocol — any class with the right methods works, no inheritance required:

from thompson_bandits import ArmStore, ArmStats

class RedisStore:
    def get_stats(self, arm_id: str) -> ArmStats | None: ...
    def update_stats(self, arm_id: str, alpha_delta: float, beta_delta: float, reward: float) -> None: ...
    def get_all_arms(self) -> list[ArmStats]: ...
    def decay(self, arm_id: str, factor: float) -> None: ...

Non-stationary environments

When the best option changes over time, enable discounting:

from thompson_bandits import ThompsonBandit, InMemoryStore, BanditConfig

config = BanditConfig(discount=0.95)  # decay factor in (0, 1)
bandit = ThompsonBandit(store, config=config)

Before each update, existing evidence is decayed by the discount factor. Recent observations carry more weight than old ones.

Cost-aware optimization

When options have different costs (tokens, latency, dollars), scale rewards accordingly:

from thompson_bandits import cost_aware_reward

raw_reward = 0.9
token_cost = 1500
baseline_cost = 1000

adjusted = cost_aware_reward(raw_reward, cost=token_cost, baseline_cost=baseline_cost)
bandit.update(arm, reward=adjusted)

Inspecting state

summary = bandit.get_summary()
print(summary.best_arm)      # 'relevance_heavy'
print(summary.total_pulls)   # 50

for arm in summary.arms:
    print(f"{arm.arm_id}: mean={arm.mean:.3f}, pulls={arm.pulls}")
# balanced:        mean=0.577, pulls=11
# recency_heavy:   mean=0.421, pulls=8
# relevance_heavy: mean=0.890, pulls=31

Warm-start transfer

When you have prior knowledge (from a previous experiment, a related task, or domain expertise), encode it as informative priors instead of starting from uniform:

from thompson_bandits import ThompsonBandit, InMemoryStore, BanditConfig

# Previous experiment found relevance_heavy won ~63% of pulls.
# Encode that as Beta(6.3, 3.7) instead of the default Beta(1, 1).
config = BanditConfig(prior_alpha=1.0, prior_beta=1.0)
store = InMemoryStore(arm_ids=["relevance_heavy", "balanced", "recency_heavy"])

# Override priors for the arm with known history
arm = store.get_stats("relevance_heavy")
arm.alpha = 6.3
arm.beta = 3.7

bandit = ThompsonBandit(store, config=config)

The bandit starts biased toward the prior winner but remains open to switching if the data disagrees. With shrinkage (e.g., scaling the prior by 0.15), the prior influence fades within ~20 observations.

Research

Dial extracts the Thompson Sampling engine from a research experiment on gradient-free retrieval weight learning. The experiment ran 1,200 episodes across 4 conditions on a $50/month API budget.

Citation (BibTeX)

@article{dirocco2026gradient,
  title   = {Gradient-Free Retrieval Weight Learning via Thompson Sampling
             with LLM Self-Assessment},
  author  = {DiRocco, Alfonso},
  year    = {2026},
  url     = {https://github.com/kusp-dev/retrieval-weight-experiment},
  note    = {1,200 episodes, 4 conditions, +41\% NDCG@5 over fixed baselines}
}

Development

git clone https://github.com/fonz-ai/dial.git
cd dial
uv sync --extra dev
uv run pytest tests/ -v
uv run ruff check src/ tests/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fonz_dial-0.1.1.tar.gz (13.7 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fonz_dial-0.1.1-py3-none-any.whl (10.9 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file fonz_dial-0.1.1.tar.gz.

File metadata

Download URL: fonz_dial-0.1.1.tar.gz
Upload date: Apr 1, 2026
Size: 13.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fonz_dial-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a4b1df218686e1dfe6816458bbdcd10bef6733607fbf1699d2cc2c389499925b`
MD5	`a1e6de661a8b50eee370b0e1e5d8521a`
BLAKE2b-256	`87025d84d2807a50633ea7f209e70d312a96713466ee85d4953fa2097d6ad712`

See more details on using hashes here.

File details

Details for the file fonz_dial-0.1.1-py3-none-any.whl.

File metadata

Download URL: fonz_dial-0.1.1-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 10.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fonz_dial-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb4455ba00fe8b25b489cf558743aa050912c437f332e09a87dd3f92095cb7ab`
MD5	`5a199cb4773d788ee856797cdcd7ae13`
BLAKE2b-256	`bd427cd53d84c3bbca63441999a4edf872f08e3e2b5316ccfbeeff921b00ac1d`

See more details on using hashes here.

fonz-dial 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dial

Quick start

Why Dial?

Use cases

Features

Storage backends

In-memory (ephemeral)

SQLite (persistent)

Custom storage

Non-stationary environments

Cost-aware optimization

Inspecting state

Warm-start transfer

Research

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes