Skip to main content

A lightweight reinforcement learning package

Project description

LightRL

A lightweight multi-armed bandit library for Python. Zero heavy dependencies. Built for agents.

PyPI - Python Version Docs main PyPI - Version

Why LightRL

Most RL packages (Vowpal Wabbit, RLlib, MABWiser) are built for data science pipelines — they require heavy dependencies, custom serialization, and framework buy-in. LightRL is built for a different use case: operational decisions in software systems and AI agents.

Think of LightRL as functools.lru_cache for decision-making. You don't reach for Redis when you need to memoize a function. You don't reach for Vowpal Wabbit when you need an agent to learn which API endpoint is fastest.

The case for lightweight bandits

LightRL Vowpal Wabbit MABWiser RLlib
Dependencies tqdm C++ runtime sklearn, numpy, scipy Ray, torch
Install size ~50KB ~50MB ~200MB+ ~1GB+
Core code ~300 lines ~100k lines ~3k lines ~500k lines
Persistence bandit.save("state.json") Custom model files Pickle Ray checkpoints
Time to integrate Minutes Hours Hours Days
Agent-native API BanditRouter No No No

When to use LightRL

LightRL is the right choice when:

  • You need a decision, not a research paper. Which endpoint is fastest? What batch size avoids rate limits? Which prompt template works best? These don't need gradient descent.
  • You're building an agent. LLMs burn tokens and latency "reasoning" about operational choices. A bandit answers in microseconds with better accuracy after 50 observations.
  • Dependencies matter. Lambda functions, edge devices, minimal containers, CI pipelines — anywhere scipy is too heavy.
  • You want to understand the code. The entire library is auditable in 10 minutes. No hidden complexity.

When NOT to use LightRL

  • You need industrial-scale contextual bandits processing billions of events (use Vowpal Wabbit)
  • You need full reinforcement learning with environments and policies (use RLlib)
  • You need Bayesian optimization with Gaussian processes (use BoTorch)

Installation

pip install lightrl

Or with uv:

uv pip install lightrl

Quick Start

Simple bandit

from lightrl import EpsilonGreedyBandit

bandit = EpsilonGreedyBandit(arms=["model_a", "model_b", "model_c"], epsilon=0.1)

for _ in range(1000):
    arm = bandit.select_arm()
    reward = get_reward(bandit.arms[arm])  # your reward function
    bandit.update(arm, reward)

bandit.report()

Agent router with persistence

from lightrl import BanditRouter, ThompsonBandit

router = BanditRouter()
router.register("model", ThompsonBandit(arms=["haiku", "sonnet", "opus"]))

# agent loop
model_idx = router.select("model")
model = router._bandits["model"].arms[model_idx]
# ... agent does work, gets quality score ...
router.update("model", model_idx, reward=quality_score)

# persist across restarts
router.save("agent_state.json")
router = BanditRouter.load("agent_state.json")

Contextual decisions

from lightrl import LinUCBBandit

bandit = LinUCBBandit(n_arms=3, n_features=4, alpha=1.0)

context = [task_complexity, input_length, is_code, urgency]
arm = bandit.select_arm(context)
# ... execute with chosen arm ...
bandit.update(arm, context, reward)

Features

Bandit Strategies

Strategy Best for
EpsilonGreedyBandit Simple explore/exploit with fixed exploration rate
EpsilonFirstBandit Pure exploration phase followed by exploitation
EpsilonDecreasingBandit Exploration that decays over time
UCB1Bandit Upper confidence bound — principled exploration
ThompsonBandit Bayesian posterior sampling — best general-purpose
GreedyBanditWithHistory Sliding window for non-stationary environments
LinUCBBandit Context-dependent decisions with linear features

Cross-cutting features

  • Warm start — initialize with prior beliefs: EpsilonGreedyBandit(arms=[...], priors=[0.3, 0.7, 0.9])
  • EMA decay — make any bandit adaptive to change: EpsilonGreedyBandit(arms=[...], ema_alpha=0.1)
  • Persistence — JSON save/load on all bandits and the router
  • BanditRouter — manage multiple named decision points with one object

Runners

  • two_state_time_dependent_process — alive/waiting state machine for rate-limited systems

Examples

See examples/ for 17 runnable scripts:

Example Demonstrates
Classics
ab_testing.py A/B test across landing page variants
ad_serving.py Ad placement optimization with explore-first
resource_allocation.py Dynamic worker pool sizing
hyperparameter_search.py Bandit-based learning rate search
network_routing.py Endpoint selection with sliding window
minimal_example.py Two-state process with failure simulation
Agent & LLM
prompt_template_selection.py Per-task-type prompt template optimization
agent_router.py Multi-decision agent loop with save/load
contextual_model_routing.py LinUCB routes tasks to LLM models by features
llm_cost_optimizer.py Quality-vs-cost optimization across haiku/sonnet/opus
agent_fleet_roi.py Dispatch the right AI agent per task category
Infrastructure
retry_backoff.py Learning optimal retry wait times
db_query_routing.py Query execution strategy per table size
dynamic_pricing.py Price point optimization with seasonal shifts
Techniques
warm_start.py Prior beliefs vs naive cold start
ema_nonstationary.py EMA decay vs cumulative average on regime change
persistence.py JSON save/load across process restarts

Development

uv venv --python 3.13
source .venv/bin/activate
uv pip install -e ".[dev]"
pre-commit install && pre-commit install --hook-type commit-msg

Run tests:

pytest -v tests/
tox

Read more about Multi-armed bandits.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightrl-1.0.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lightrl-1.0.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file lightrl-1.0.0.tar.gz.

File metadata

  • Download URL: lightrl-1.0.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for lightrl-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2e211043497e1570639e1eeceac5499e20d1286c026055d454ed38ed758d2091
MD5 49f523fc4c92c83cd5b0374eb86ec6e3
BLAKE2b-256 12b629397d6b611e76b9962d9d85433801d501fced84818a3124f9d7964dd4f2

See more details on using hashes here.

File details

Details for the file lightrl-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: lightrl-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for lightrl-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc59a3c5454a5eda3d47c865305eb00d7e60c6253f6dad6105fee79be9cc1ac2
MD5 6001ea29d991c5be1492859e946a422a
BLAKE2b-256 6f7ea8fba7c97ff1291d23205a07c51ddea2ef19c89d43d9dac02d8d49360030

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page