Deep Reinforcement Learning kit for research.
Project description
drlab
drlab is a small deep reinforcement learning package for research code and
experiments. It provides reusable building blocks for Gymnasium environments:
- DQN and actor-critic learners
- greedy, epsilon-greedy, and stochastic controllers
- a transition runner for collecting environment interaction
- replay buffer and transition batch utilities
- lightweight experiment wrappers with TensorBoard logging
The package is designed around small, composable pieces: a PyTorch model, controller, runner, learner, and optionally an experiment wrapper.
Installation
From the repository root:
python -m pip install -e .
For experiment and development dependencies:
python -m pip install -e ".[experiments,dev]"
Package Overview
Public classes are available from the package root:
from drlab import (
ActorCritic,
ActorCriticConfig,
ActorCriticExperiment,
ActorCriticExperimentConfig,
Controller,
DQN,
DQNConfig,
DQNExperiment,
DQNExperimentConfig,
EpsilonGreedyController,
GreedyController,
ReplayBuffer,
Runner,
StochasticController,
TransitionBatch,
)
They can also be imported from their subpackages:
| Subpackage | Exports | Purpose |
|---|---|---|
drlab.learners |
DQN, DQNConfig, ActorCritic, ActorCriticConfig |
Update PyTorch models from transition batches. |
drlab.controllers |
Controller, GreedyController, EpsilonGreedyController, StochasticController |
Convert model outputs into environment actions. |
drlab.runners |
Runner |
Collect transitions from a Gymnasium environment. |
drlab.replay |
ReplayBuffer, TransitionBatch |
Store, sample, move, and concatenate transitions. |
drlab.experiments |
DQNExperiment, DQNExperimentConfig, ActorCriticExperiment, ActorCriticExperimentConfig |
Run training loops with logging and progress bars. |
Implemented Algorithms
| Algorithm | Type | Implementation Summary |
|---|---|---|
| DQN | Off-policy value-based RL | Trains a Q-network with one-step TD targets from (state, action, reward, done, next_state) batches. It supports replay-buffer training through DQNExperiment, target networks, Double DQN action selection, hard or soft target-network updates, gradient clipping, configurable discounting, and custom regularizers. |
| Actor-Critic | On-policy policy-gradient RL | Trains a shared policy/value network from transition batches and returns. The policy head is optimized with advantage-weighted log probabilities, while the value head can use TD targets or full returns. It supports bootstrapped advantages, optional baseline subtraction, advantage normalization, entropy regularization with annealing, gradient clipping, custom regularizers, and PPO-style clipped policy updates for extra optimization passes. |
The package also includes reusable action-selection controllers:
GreedyController: deterministic argmax action selection from model scores.EpsilonGreedyController: epsilon-greedy exploration with linear annealing.StochasticController: samples actions from softmax probabilities.
Model Output Convention
Controllers and learners expect the model output to use a shared layout:
- DQN models should output at least
num_actionscolumns. The firstnum_actionscolumns are treated as action scores. - Actor-critic models should output at least
num_actions + 1columns. The firstnum_actionscolumns are policy logits, and the next column is the value estimate.
Quick DQN Example
import gymnasium as gym
import torch as th
from drlab import (
DQN,
DQNConfig,
DQNExperiment,
DQNExperimentConfig,
EpsilonGreedyController,
GreedyController,
)
env = gym.make("CartPole-v1")
model = th.nn.Sequential(
th.nn.Linear(4, 64),
th.nn.ReLU(),
th.nn.Linear(64, 2),
)
optimizer = th.optim.Adam(model.parameters(), lr=1e-3)
learner = DQN(model, optimizer, DQNConfig(num_actions=2))
controller = EpsilonGreedyController(
GreedyController(model, num_actions=2),
num_actions=2,
max_eps=1.0,
min_eps=0.05,
anneal_steps=10_000,
)
experiment = DQNExperiment(
env,
controller,
learner,
DQNExperimentConfig(
max_steps=20_000,
run_steps=1,
batch_size=128,
log_dir="runs/cartpole_dqn",
),
)
experiment.run()
Core Components
Learners
DQN trains a Q-network from (rewards, dones, states, actions, next_states).
Its config supports target networks, double Q-learning, hard or soft target
updates, gradient clipping, discounting, and custom regularizers.
from drlab.learners import DQN, DQNConfig
ActorCritic trains a policy/value network from transition batches with
returns. Its config supports TD or return-based value targets, bootstrapped
advantages, PPO-style clipping, entropy regularization, advantage
normalization, and custom regularizers.
from drlab.learners import ActorCritic, ActorCriticConfig
Controllers
Controllers wrap a PyTorch model and expose:
action = controller.choose(obs)
probs = controller.probabilities(obs)
Available controllers:
GreedyController: selects the highest-scoring action.EpsilonGreedyController: wraps another controller and adds annealed random exploration.StochasticController: samples actions from softmax probabilities.
Runner
Runner steps through a Gymnasium environment with a controller and returns:
batch, ep_returns, ep_lengths, last_episode = runner.run(num_steps)
num_steps <= 0 collects one complete episode. Positive values collect up to
that many transitions. The returned batch is a TransitionBatch.
Replay
TransitionBatch stores tensors for:
statesactionsrewardsdonesnext_statesreturns
It provides .to(device) and .cat(other) helpers.
ReplayBuffer stores fixed-capacity NumPy arrays and returns sampled or full
data as TransitionBatch instances:
buffer = ReplayBuffer(capacity=10_000, obs_shape=env.observation_space.shape)
batch = buffer.sample(128)
all_data = buffer.get_all()
Experiments
Experiment wrappers combine an environment, controller, learner, runner, replay buffer behavior, progress bar, and TensorBoard logging.
from drlab.experiments import (
ActorCriticExperiment,
ActorCriticExperimentConfig,
DQNExperiment,
DQNExperimentConfig,
)
Use DQNExperiment for off-policy DQN training and ActorCriticExperiment for
on-policy actor-critic training.
Development
Install development dependencies:
python -m pip install -e ".[dev]"
Run the test suite:
python -m unittest discover -v
Build a wheel:
python -m build --wheel
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drlab-0.1.1.tar.gz.
File metadata
- Download URL: drlab-0.1.1.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e121c30926cd1ff04309d21606a99c60fea2867b0cd627fa974a24ab10e21fe1
|
|
| MD5 |
24989444bb20cd1112f146b48ac0e7e1
|
|
| BLAKE2b-256 |
d0692686680a7a801a75a05ffdd1a6770cefa21c1c4cdbac009350a61844f8eb
|
File details
Details for the file drlab-0.1.1-py3-none-any.whl.
File metadata
- Download URL: drlab-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b87ea03835825332baa90c2071d4f72e0293c20b9cba046624d2ff93747e0818
|
|
| MD5 |
69e0ed86ef84bc747f73f0905f70c080
|
|
| BLAKE2b-256 |
e2befbbac3427f64c65f7e57a0a9b339d0fa96ab3d6750c2d6079ef7eabfa7d5
|