Official Python SDK for BanditDB
Project description
BanditDB Python SDK
The official Python client and Model Context Protocol (MCP) server for BanditDB — the ultra-fast, lock-free Contextual Bandit database written in Rust.
BanditDB abstracts away the complex linear algebra of Reinforcement Learning (LinUCB, Thompson Sampling) behind a dead-simple API. Build real-time personalizers, dynamic A/B tests, and give LLM agents mathematically rigorous persistent memory.
Installation
pip install banditdb-python
Requires the BanditDB Rust server running (default: http://localhost:8080).
1. Standard SDK Usage
The client features automatic connection pooling, exponential backoff retries, and strict timeouts.
from banditdb import Client, BanditDBError
# Connect to the BanditDB server.
# Pass api_key if BANDITDB_API_KEY is set on the server.
db = Client(
url="http://localhost:8080",
timeout=2.0,
api_key="your-secret-key", # omit if server runs without auth
)
try:
# 1. Create a campaign (run once at startup)
# algorithm defaults to "linucb"; use "thompson_sampling" for Bayesian exploration
db.create_campaign(
campaign_id="checkout_upsell",
arms=["offer_discount", "offer_free_shipping"],
feature_dim=3,
)
# or: db.create_campaign(..., algorithm="thompson_sampling")
# 2. A user arrives — ask the database what to show them
# Context: [is_mobile, cart_value_normalized, is_returning_user]
arm_id, interaction_id = db.predict("checkout_upsell", [1.0, 0.8, 0.0])
print(f"Showing: {arm_id}") # e.g., "offer_free_shipping"
# 3. The user clicked — send the reward
db.reward(interaction_id, reward=1.0)
except BanditDBError as e:
print(f"Database error: {e}")
All Client methods
Health
| Method | Description |
|---|---|
health() |
Returns True if the server is reachable and the WAL writer is healthy. |
health_detail() |
Returns the full health dict including per-campaign entropy and status ("ok" / "warning" / "critical"). |
Campaigns
| Method | Description |
|---|---|
create_campaign(campaign_id, arms, feature_dim, alpha=1.0, algorithm="linucb", metadata=None) |
Register a new campaign. algorithm accepts "linucb", "thompson_sampling", NeuralLinUCBConfig, or ProgressiveConfig. metadata is an arbitrary JSON dict (≤ 64 KB). |
list_campaigns() |
Returns a list of all campaigns (active and archived) with alpha, arm_count, and algorithm. |
campaign_info(campaign_id) |
Returns full per-arm state: theta, theta_norm, prediction and reward counters. Raises APIError (404) if not found. |
report(campaign_id) |
Business-level convergence report. converged=True means one arm has a statistically significant lead at 95% CI — safe to stop. converged=False means leading but CIs still overlap. converged=None means not enough data yet (< 30 rewards per arm). |
diagnostics(campaign_id) |
Operator diagnostics: per-arm theta norms, A_inv uncertainty bounds, entropy health (selection_entropy, entropy_status, entropy_trend, likely_cause, suggested_action), tournament traffic, and neural buffer size. |
archive_campaign(campaign_id) |
Soft-delete: pauses predictions/rewards but preserves all learned weights. Recoverable with restore_campaign(). |
restore_campaign(campaign_id) |
Restore an archived campaign to active status with all weights intact. |
delete_campaign(campaign_id) |
Permanently delete a campaign. Returns False if not found. |
Predict & Reward
| Method | Description |
|---|---|
predict(campaign_id, context) |
Returns (arm_id, interaction_id). Pass interaction_id to reward() to close the loop. |
batch_predict(predictions) |
Predict for up to 100 campaign/context pairs in a single round-trip. Each item: {"campaign_id": str, "context": List[float]}. Returns list of {arm_id, interaction_id} or {error} per item. |
reward(interaction_id, reward) |
Record outcome. reward must be in [0.0, 1.0]. Raises APIError if the interaction has already been rewarded or has expired (default TTL: 24 h). |
Data & Export
| Method | Description |
|---|---|
checkpoint() |
Flush WAL, snapshot models, write Parquet shards, run neural retrain + tournament eval, rotate WAL. Returns a summary string. |
export() |
List Parquet export shards grouped by campaign. Returns {export_dir, shards}. |
2. The AI "Hive Mind" (Model Context Protocol)
Standard LLM agents are stateless — if they route a task to the wrong model and fail, they repeat the same mistake tomorrow. BanditDB's built-in MCP server gives the entire agent swarm shared persistent memory.
Starting the MCP server
# Set environment variables before starting
export BANDITDB_URL=http://localhost:8080
export BANDITDB_API_KEY=your-secret-key # omit if server runs without auth
banditdb-mcp
Connecting to Claude Desktop
Add to your Claude configuration file:
- Mac:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"banditdb": {
"command": "banditdb-mcp",
"args": [],
"env": {
"BANDITDB_URL": "http://localhost:8080",
"BANDITDB_API_KEY": "your-secret-key"
}
}
}
}
The agent swarm now has nine tools:
| Tool | What it does |
|---|---|
create_campaign |
Create a new decision campaign. Accepts algorithm ("linucb" or "thompson_sampling") and alpha. Use Thompson Sampling for natural Bayesian exploration with no tuning needed. |
list_campaigns |
List all active campaigns (shows algorithm and alpha) — useful to check what exists before calling get_intuition. |
campaign_diagnostics |
Inspect per-arm learning state: theta_norm, prediction counts, reward rates, and entropy health. Use when a campaign doesn't seem to be learning or one arm is dominating. |
campaign_report |
Business-level convergence report. Tells you whether the campaign has statistically converged and which arm is winning with confidence intervals. |
get_intuition |
Ask BanditDB which arm to pick for a given context. Returns the arm and an interaction_id to save. |
batch_get_intuition |
Get decisions for multiple campaigns in a single round-trip. Pass a list of {campaign_id, context} dicts. |
record_outcome |
Report whether the chosen action succeeded (1.0) or failed (0.0). Updates the shared model. |
archive_campaign |
Soft-delete a campaign. Pauses predictions/rewards but preserves all learned weights. |
restore_campaign |
Restore an archived campaign to active status with all weights intact. |
Every decision made by any agent in the network improves the routing for all future agents.
3. Data Science & Offline Evaluation
BanditDB event-sources every prediction and reward to a Write-Ahead Log (WAL). Calling checkpoint() compiles completed prediction→reward pairs into Snappy-compressed Parquet files — one per campaign — for offline analysis with Polars or Pandas.
Every prediction is guaranteed to appear in the Parquet file even if its reward arrives hours later: BanditDB re-emits in-flight interactions at each checkpoint so delayed rewards are always captured in a future cycle.
# Checkpoint: snapshot models, write Parquet, rotate the WAL.
# Call this on a schedule or after significant traffic.
summary = db.checkpoint()
print(summary)
# "Checkpoint written and WAL rotated: 2 campaigns, offset 4821 bytes,
# 150 interactions exported, 3 in-flight re-emitted"
# List which Parquet files are available
print(db.export())
# 'Parquet files in /data/exports: ["llm_routing.parquet"]'
# Load directly from the mounted volume into Polars.
# Flat schema: interaction_id | arm_id | reward | predicted_at | rewarded_at | propensity | feature_0 | ...
import polars as pl
df = pl.read_parquet("/data/exports/llm_routing.parquet")
print(df.head())
print(df.columns)
Offline Policy Evaluation (OPE)
The SDK ships three OPE estimators in banditdb.eval. They answer the question: "what would my average reward have been under a different policy — without running a live experiment?"
Install the eval dependencies:
pip install "banditdb-python[eval]"
| Estimator | Function | How it works | When to use |
|---|---|---|---|
| Replay | replay(df) |
Accepts each interaction with probability (1/K) / propensity (Li et al. 2010). Unbiased sample of the uniform random policy. |
Sanity check baseline. Low coverage is expected — ~1/K of interactions are used. |
| IPS / SNIPS | ips(df, clip=10.0) |
Uses every interaction with importance weight (1/K) / propensity. Self-normalised to reduce variance. Weight clipping (default 10×) controls the bias-variance tradeoff. |
Primary estimator. Use when you have enough data but want full coverage. |
| Doubly Robust | doubly_robust(df, clip=10.0) |
Fits a linear reward model, then applies an IPS correction on residuals. Consistent if either the reward model or the propensities are correct. | Best statistical efficiency. Use when comparing multiple policies or sweeping alpha. |
All three estimators:
- Accept a Polars or pandas DataFrame loaded from a BanditDB Parquet export
- Evaluate the uniform random policy as the target (the unbiased baseline to beat)
- Raise
ValueErrorfor Thompson Sampling campaigns (propensity column is null — TS does not log propensities) - Return an
OPEResultwithestimate,std_error,n_used,n_total, andmethod
import polars as pl
from banditdb.eval import replay, ips, doubly_robust
df = pl.read_parquet("/data/exports/llm_routing.parquet")
# How much reward would a uniform random policy have earned?
print(replay(df))
# OPEResult(method='replay', estimate=0.4821, std_error=0.0312, coverage=22.1% [33/149])
print(ips(df))
# OPEResult(method='ips', estimate=0.5103, std_error=0.0187, coverage=100.0% [149/149])
print(doubly_robust(df))
# OPEResult(method='doubly_robust', estimate=0.5219, std_error=0.0141, coverage=100.0% [149/149])
# Compare against the observed reward of the logging policy:
print("Observed (logging policy):", df["reward"].mean())
# If observed >> estimate, the campaign has learned something real — it outperforms random.
Practical use: sweep alpha offline before deploying. Train a campaign on real traffic, checkpoint to Parquet, then replay different alpha values through doubly_robust() to find the best exploration level — no live experiment needed.
Note: OPE requires the
propensitycolumn, which is only written for LinUCB campaigns. Thompson Sampling campaigns lognullpropensities because TS arm selection is stochastic and propensity scoring requires a deterministic logging policy.
Choosing an Algorithm
BanditDB supports four algorithms, selected at campaign creation time.
| Algorithm | algorithm value |
Exploration style | When to use |
|---|---|---|---|
| LinUCB | "linucb" (default) |
Deterministic UCB bonus: θ·x + α√(x·A⁻¹·x) |
Predictable, tunable. Sweep alpha offline to calibrate. |
| Linear Thompson Sampling | "thompson_sampling" |
Samples θ̃ ~ N(θ, α²·A⁻¹), scores by θ̃·x | Bayesian posterior — no alpha-sweep needed. Concurrent users automatically diversify choices. |
| NeuralLinUCB | NeuralLinUCBConfig(...) |
Deep MLP embedding + LinUCB in embedding space | Non-linear reward functions. Retrains the MLP every N rewards. |
| Progressive | ProgressiveConfig(...) |
Self-tuning tournament: runs base + challenger in parallel, shifts traffic to the winner | Zero-configuration model selection. Picks the best algorithm automatically. |
from banditdb import Client, NeuralLinUCBConfig, ProgressiveConfig
db = Client("http://localhost:8080")
# LinUCB (default)
db.create_campaign("routing", ["fast", "cheap"], feature_dim=4, alpha=1.5)
# Thompson Sampling — natural Bayesian exploration, alpha=1.0 is ideal
db.create_campaign("routing_ts", ["fast", "cheap"], feature_dim=4,
algorithm="thompson_sampling")
# NeuralLinUCB — learns a deep embedding of the context, then applies LinUCB
cfg = NeuralLinUCBConfig(
context_dim=4, # must match feature_dim
embed_dim=32, # arm matrix dimension (default 32)
hidden_dim=128, # MLP hidden layer width (default 128)
retrain_every=200, # retrain the MLP every N cumulative rewards
)
db.create_campaign("routing_neural", ["fast", "cheap"], feature_dim=4, algorithm=cfg)
# Progressive — runs LinUCB vs NeuralLinUCB, shifts traffic to whoever wins SNIPS checkpoints
cfg = ProgressiveConfig(
base="linucb",
challenger=NeuralLinUCBConfig(context_dim=4, embed_dim=32),
min_obs=100, # minimum buffer entries per arm before any traffic shift
required_wins=3, # consecutive checkpoint wins to earn one traffic step
step_bps=1000, # traffic delta per win run, in basis points (1000 = 10%)
)
db.create_campaign("routing_prog", ["fast", "cheap"], feature_dim=4, algorithm=cfg)
All four algorithms share the same predict → reward loop.
Error Handling
| Exception | When raised |
|---|---|
BanditDBError |
Base exception — catch this to handle all SDK errors. |
ConnectionError |
Server is offline or unreachable. |
TimeoutError |
Request exceeded the configured timeout. |
APIError |
Server returned an error (e.g., campaign not found, unauthorized). |
License
Apache-2.0 — Copyright (C) 2026 Simeon Lukov and Dynamic Pricing Ltd. See the main repository for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file banditdb_python-0.1.5.tar.gz.
File metadata
- Download URL: banditdb_python-0.1.5.tar.gz
- Upload date:
- Size: 31.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a95d7edc7dbd2f35b89060a3aa5f82942897620171448daf2243b404fbed28e
|
|
| MD5 |
932950912da8d23fbc55522f37636ab8
|
|
| BLAKE2b-256 |
e0169a47a0592a43d12503c6098a62d0a951c74ba7dd5635aace4c0837944e72
|
File details
Details for the file banditdb_python-0.1.5-py3-none-any.whl.
File metadata
- Download URL: banditdb_python-0.1.5-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9392c7ca0f4cdd6ba519281ed82fcfe1a6b0c7762c7125ced6a246186156708c
|
|
| MD5 |
ba02626a278c458f3e000af8bbe76089
|
|
| BLAKE2b-256 |
5ce1d0b2c824fd8bcc07ffe01f6bc8df09b056d6853aa7efcbe2610f73268c82
|