A lightweight reinforcement learning package
Project description
LightRL
A lightweight multi-armed bandit library for Python. Zero heavy dependencies. Built for agents.
Why LightRL
Most RL packages (Vowpal Wabbit, RLlib, MABWiser) are built for data science pipelines — they require heavy dependencies, custom serialization, and framework buy-in. LightRL is built for a different use case: operational decisions in software systems and AI agents.
Think of LightRL as functools.lru_cache for decision-making. You don't reach for Redis when you need to memoize a function. You don't reach for Vowpal Wabbit when you need an agent to learn which API endpoint is fastest.
The case for lightweight bandits
| LightRL | Vowpal Wabbit | MABWiser | RLlib | |
|---|---|---|---|---|
| Dependencies | tqdm |
C++ runtime | sklearn, numpy, scipy | Ray, torch |
| Install size | ~50KB | ~50MB | ~200MB+ | ~1GB+ |
| Core code | ~300 lines | ~100k lines | ~3k lines | ~500k lines |
| Persistence | bandit.save("state.json") |
Custom model files | Pickle | Ray checkpoints |
| Time to integrate | Minutes | Hours | Hours | Days |
| Agent-native API | BanditRouter |
No | No | No |
When to use LightRL
LightRL is the right choice when:
- You need a decision, not a research paper. Which endpoint is fastest? What batch size avoids rate limits? Which prompt template works best? These don't need gradient descent.
- You're building an agent. LLMs burn tokens and latency "reasoning" about operational choices. A bandit answers in microseconds with better accuracy after 50 observations.
- Dependencies matter. Lambda functions, edge devices, minimal containers, CI pipelines — anywhere scipy is too heavy.
- You want to understand the code. The entire library is auditable in 10 minutes. No hidden complexity.
When NOT to use LightRL
- You need industrial-scale contextual bandits processing billions of events (use Vowpal Wabbit)
- You need full reinforcement learning with environments and policies (use RLlib)
- You need Bayesian optimization with Gaussian processes (use BoTorch)
Installation
pip install lightrl
Or with uv:
uv pip install lightrl
Quick Start
Simple bandit
from lightrl import EpsilonGreedyBandit
bandit = EpsilonGreedyBandit(arms=["model_a", "model_b", "model_c"], epsilon=0.1)
for _ in range(1000):
arm = bandit.select_arm()
reward = get_reward(bandit.arms[arm]) # your reward function
bandit.update(arm, reward)
bandit.report()
Agent router with persistence
from lightrl import BanditRouter, ThompsonBandit
router = BanditRouter()
router.register("model", ThompsonBandit(arms=["haiku", "sonnet", "opus"]))
# agent loop
model_idx = router.select("model")
model = router._bandits["model"].arms[model_idx]
# ... agent does work, gets quality score ...
router.update("model", model_idx, reward=quality_score)
# persist across restarts
router.save("agent_state.json")
router = BanditRouter.load("agent_state.json")
Contextual decisions
from lightrl import LinUCBBandit
bandit = LinUCBBandit(n_arms=3, n_features=4, alpha=1.0)
context = [task_complexity, input_length, is_code, urgency]
arm = bandit.select_arm(context)
# ... execute with chosen arm ...
bandit.update(arm, context, reward)
Features
Bandit Strategies
| Strategy | Best for |
|---|---|
EpsilonGreedyBandit |
Simple explore/exploit with fixed exploration rate |
EpsilonFirstBandit |
Pure exploration phase followed by exploitation |
EpsilonDecreasingBandit |
Exploration that decays over time |
UCB1Bandit |
Upper confidence bound — principled exploration |
ThompsonBandit |
Bayesian posterior sampling — best general-purpose |
GreedyBanditWithHistory |
Sliding window for non-stationary environments |
LinUCBBandit |
Context-dependent decisions with linear features |
Cross-cutting features
- Warm start — initialize with prior beliefs:
EpsilonGreedyBandit(arms=[...], priors=[0.3, 0.7, 0.9]) - EMA decay — make any bandit adaptive to change:
EpsilonGreedyBandit(arms=[...], ema_alpha=0.1) - Persistence — JSON save/load on all bandits and the router
- BanditRouter — manage multiple named decision points with one object
Runners
two_state_time_dependent_process— alive/waiting state machine for rate-limited systems
Examples
See examples/ for 17 runnable scripts:
| Example | Demonstrates |
|---|---|
| Classics | |
ab_testing.py |
A/B test across landing page variants |
ad_serving.py |
Ad placement optimization with explore-first |
resource_allocation.py |
Dynamic worker pool sizing |
hyperparameter_search.py |
Bandit-based learning rate search |
network_routing.py |
Endpoint selection with sliding window |
minimal_example.py |
Two-state process with failure simulation |
| Agent & LLM | |
prompt_template_selection.py |
Per-task-type prompt template optimization |
agent_router.py |
Multi-decision agent loop with save/load |
contextual_model_routing.py |
LinUCB routes tasks to LLM models by features |
llm_cost_optimizer.py |
Quality-vs-cost optimization across haiku/sonnet/opus |
agent_fleet_roi.py |
Dispatch the right AI agent per task category |
| Infrastructure | |
retry_backoff.py |
Learning optimal retry wait times |
db_query_routing.py |
Query execution strategy per table size |
dynamic_pricing.py |
Price point optimization with seasonal shifts |
| Techniques | |
warm_start.py |
Prior beliefs vs naive cold start |
ema_nonstationary.py |
EMA decay vs cumulative average on regime change |
persistence.py |
JSON save/load across process restarts |
Development
uv venv --python 3.13
source .venv/bin/activate
uv pip install -e ".[dev]"
pre-commit install && pre-commit install --hook-type commit-msg
Run tests:
pytest -v tests/
tox
Read more about Multi-armed bandits.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lightrl-1.0.0.tar.gz.
File metadata
- Download URL: lightrl-1.0.0.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e211043497e1570639e1eeceac5499e20d1286c026055d454ed38ed758d2091
|
|
| MD5 |
49f523fc4c92c83cd5b0374eb86ec6e3
|
|
| BLAKE2b-256 |
12b629397d6b611e76b9962d9d85433801d501fced84818a3124f9d7964dd4f2
|
File details
Details for the file lightrl-1.0.0-py3-none-any.whl.
File metadata
- Download URL: lightrl-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc59a3c5454a5eda3d47c865305eb00d7e60c6253f6dad6105fee79be9cc1ac2
|
|
| MD5 |
6001ea29d991c5be1492859e946a422a
|
|
| BLAKE2b-256 |
6f7ea8fba7c97ff1291d23205a07c51ddea2ef19c89d43d9dac02d8d49360030
|