A lightweight reinforcement learning package

These details have not been verified by PyPI

Project description

LightRL

A lightweight multi-armed bandit library for Python. Zero heavy dependencies. Built for agents.

PyPI - Python Version main PyPI - Version

Why LightRL

Most RL packages (Vowpal Wabbit, RLlib, MABWiser) are built for data science pipelines — they require heavy dependencies, custom serialization, and framework buy-in. LightRL is built for a different use case: operational decisions in software systems and AI agents.

Think of LightRL as functools.lru_cache for decision-making. You don't reach for Redis when you need to memoize a function. You don't reach for Vowpal Wabbit when you need an agent to learn which API endpoint is fastest.

The case for lightweight bandits

	LightRL	Vowpal Wabbit	MABWiser	RLlib
Dependencies	`tqdm`	C++ runtime	sklearn, numpy, scipy	Ray, torch
Install size	~50KB	~50MB	~200MB+	~1GB+
Core code	~300 lines	~100k lines	~3k lines	~500k lines
Persistence	`bandit.save("state.json")`	Custom model files	Pickle	Ray checkpoints
Time to integrate	Minutes	Hours	Hours	Days
Agent-native API	`BanditRouter`	No	No	No

When to use LightRL

LightRL is the right choice when:

You need a decision, not a research paper. Which endpoint is fastest? What batch size avoids rate limits? Which prompt template works best? These don't need gradient descent.
You're building an agent. LLMs burn tokens and latency "reasoning" about operational choices. A bandit answers in microseconds with better accuracy after 50 observations.
Dependencies matter. Lambda functions, edge devices, minimal containers, CI pipelines — anywhere scipy is too heavy.
You want to understand the code. The entire library is auditable in 10 minutes. No hidden complexity.

When NOT to use LightRL

You need industrial-scale contextual bandits processing billions of events (use Vowpal Wabbit)
You need full reinforcement learning with environments and policies (use RLlib)
You need Bayesian optimization with Gaussian processes (use BoTorch)

Installation

pip install lightrl

Or with uv:

uv pip install lightrl

Quick Start

Simple bandit

from lightrl import EpsilonGreedyBandit

bandit = EpsilonGreedyBandit(arms=["model_a", "model_b", "model_c"], epsilon=0.1)

for _ in range(1000):
    arm = bandit.select_arm()
    reward = get_reward(bandit.arms[arm])  # your reward function
    bandit.update(arm, reward)

bandit.report()

Agent router with persistence

from lightrl import BanditRouter, ThompsonBandit

router = BanditRouter()
router.register("model", ThompsonBandit(arms=["haiku", "sonnet", "opus"]))

# agent loop
model_idx = router.select("model")
model = router._bandits["model"].arms[model_idx]
# ... agent does work, gets quality score ...
router.update("model", model_idx, reward=quality_score)

# persist across restarts
router.save("agent_state.json")
router = BanditRouter.load("agent_state.json")

Contextual decisions

from lightrl import LinUCBBandit

bandit = LinUCBBandit(n_arms=3, n_features=4, alpha=1.0)

context = [task_complexity, input_length, is_code, urgency]
arm = bandit.select_arm(context)
# ... execute with chosen arm ...
bandit.update(arm, context, reward)

Features

Bandit Strategies

Strategy	Best for
`EpsilonGreedyBandit`	Simple explore/exploit with fixed exploration rate
`EpsilonFirstBandit`	Pure exploration phase followed by exploitation
`EpsilonDecreasingBandit`	Exploration that decays over time
`UCB1Bandit`	Upper confidence bound — principled exploration
`ThompsonBandit`	Bayesian posterior sampling — best general-purpose
`GreedyBanditWithHistory`	Sliding window for non-stationary environments
`LinUCBBandit`	Context-dependent decisions with linear features

Cross-cutting features

Warm start — initialize with prior beliefs: EpsilonGreedyBandit(arms=[...], priors=[0.3, 0.7, 0.9])
EMA decay — make any bandit adaptive to change: EpsilonGreedyBandit(arms=[...], ema_alpha=0.1)
Persistence — JSON save/load on all bandits and the router
BanditRouter — manage multiple named decision points with one object

Runners

two_state_time_dependent_process — alive/waiting state machine for rate-limited systems

Examples

See examples/ for 17 runnable scripts:

Example	Demonstrates
Classics
`ab_testing.py`	A/B test across landing page variants
`ad_serving.py`	Ad placement optimization with explore-first
`resource_allocation.py`	Dynamic worker pool sizing
`hyperparameter_search.py`	Bandit-based learning rate search
`network_routing.py`	Endpoint selection with sliding window
`minimal_example.py`	Two-state process with failure simulation
Agent & LLM
`prompt_template_selection.py`	Per-task-type prompt template optimization
`agent_router.py`	Multi-decision agent loop with save/load
`contextual_model_routing.py`	LinUCB routes tasks to LLM models by features
`llm_cost_optimizer.py`	Quality-vs-cost optimization across haiku/sonnet/opus
`agent_fleet_roi.py`	Dispatch the right AI agent per task category
Infrastructure
`retry_backoff.py`	Learning optimal retry wait times
`db_query_routing.py`	Query execution strategy per table size
`dynamic_pricing.py`	Price point optimization with seasonal shifts
Techniques
`warm_start.py`	Prior beliefs vs naive cold start
`ema_nonstationary.py`	EMA decay vs cumulative average on regime change
`persistence.py`	JSON save/load across process restarts

Development

uv venv --python 3.13
source .venv/bin/activate
uv pip install -e ".[dev]"
pre-commit install && pre-commit install --hook-type commit-msg

Run tests:

pytest -v tests/
tox

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Mar 28, 2026

0.1.1

Jan 27, 2025

0.1.0

Jan 27, 2025

0.0.1

Jan 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightrl-1.0.0.tar.gz (24.7 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lightrl-1.0.0-py3-none-any.whl (10.2 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file lightrl-1.0.0.tar.gz.

File metadata

Download URL: lightrl-1.0.0.tar.gz
Upload date: Mar 28, 2026
Size: 24.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for lightrl-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2e211043497e1570639e1eeceac5499e20d1286c026055d454ed38ed758d2091`
MD5	`49f523fc4c92c83cd5b0374eb86ec6e3`
BLAKE2b-256	`12b629397d6b611e76b9962d9d85433801d501fced84818a3124f9d7964dd4f2`

See more details on using hashes here.

File details

Details for the file lightrl-1.0.0-py3-none-any.whl.

File metadata

Download URL: lightrl-1.0.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for lightrl-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc59a3c5454a5eda3d47c865305eb00d7e60c6253f6dad6105fee79be9cc1ac2`
MD5	`6001ea29d991c5be1492859e946a422a`
BLAKE2b-256	`6f7ea8fba7c97ff1291d23205a07c51ddea2ef19c89d43d9dac02d8d49360030`

See more details on using hashes here.

lightrl 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LightRL

Why LightRL

The case for lightweight bandits

When to use LightRL

When NOT to use LightRL

Installation

Quick Start

Simple bandit

Agent router with persistence

Contextual decisions

Features

Bandit Strategies

Cross-cutting features

Runners

Examples

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes