Skip to main content

Deep Reinforcement Learning kit for research.

Project description

drlab

drlab is a small deep reinforcement learning package for research code and experiments. It provides reusable building blocks for Gymnasium environments:

  • DQN and actor-critic learners
  • greedy, epsilon-greedy, and stochastic controllers
  • a transition runner for collecting environment interaction
  • replay buffer and transition batch utilities
  • lightweight experiment wrappers with TensorBoard logging

The package is designed around small, composable pieces: a PyTorch model, controller, runner, learner, and optionally an experiment wrapper.

Installation

From the repository root:

python -m pip install -e .

For experiment and development dependencies:

python -m pip install -e ".[experiments,dev]"

Package Overview

Public classes are available from the package root:

from drlab import (
    ActorCritic,
    ActorCriticConfig,
    ActorCriticExperiment,
    ActorCriticExperimentConfig,
    Controller,
    DQN,
    DQNConfig,
    DQNExperiment,
    DQNExperimentConfig,
    EpsilonGreedyController,
    GreedyController,
    ReplayBuffer,
    Runner,
    StochasticController,
    TransitionBatch,
)

They can also be imported from their subpackages:

Subpackage Exports Purpose
drlab.learners DQN, DQNConfig, ActorCritic, ActorCriticConfig Update PyTorch models from transition batches.
drlab.controllers Controller, GreedyController, EpsilonGreedyController, StochasticController Convert model outputs into environment actions.
drlab.runners Runner Collect transitions from a Gymnasium environment.
drlab.replay ReplayBuffer, TransitionBatch Store, sample, move, and concatenate transitions.
drlab.experiments DQNExperiment, DQNExperimentConfig, ActorCriticExperiment, ActorCriticExperimentConfig Run training loops with logging and progress bars.

Implemented Algorithms

Algorithm Type Implementation Summary
DQN Off-policy value-based RL Trains a Q-network with one-step TD targets from (state, action, reward, done, next_state) batches. It supports replay-buffer training through DQNExperiment, target networks, Double DQN action selection, hard or soft target-network updates, gradient clipping, configurable discounting, and custom regularizers.
Actor-Critic On-policy policy-gradient RL Trains a shared policy/value network from transition batches and returns. The policy head is optimized with advantage-weighted log probabilities, while the value head can use TD targets or full returns. It supports bootstrapped advantages, optional baseline subtraction, advantage normalization, entropy regularization with annealing, gradient clipping, custom regularizers, and PPO-style clipped policy updates for extra optimization passes.

The package also includes reusable action-selection controllers:

  • GreedyController: deterministic argmax action selection from model scores.
  • EpsilonGreedyController: epsilon-greedy exploration with linear annealing.
  • StochasticController: samples actions from softmax probabilities.

Model Output Convention

Controllers and learners expect the model output to use a shared layout:

  • DQN models should output at least num_actions columns. The first num_actions columns are treated as action scores.
  • Actor-critic models should output at least num_actions + 1 columns. The first num_actions columns are policy logits, and the next column is the value estimate.

Quick DQN Example

import gymnasium as gym
import torch as th

from drlab import (
    DQN,
    DQNConfig,
    DQNExperiment,
    DQNExperimentConfig,
    EpsilonGreedyController,
    GreedyController,
)

env = gym.make("CartPole-v1")

model = th.nn.Sequential(
    th.nn.Linear(4, 64),
    th.nn.ReLU(),
    th.nn.Linear(64, 2),
)
optimizer = th.optim.Adam(model.parameters(), lr=1e-3)

learner = DQN(model, optimizer, DQNConfig(num_actions=2))
controller = EpsilonGreedyController(
    GreedyController(model, num_actions=2),
    num_actions=2,
    max_eps=1.0,
    min_eps=0.05,
    anneal_steps=10_000,
)

experiment = DQNExperiment(
    env,
    controller,
    learner,
    DQNExperimentConfig(
        max_steps=20_000,
        run_steps=1,
        batch_size=128,
        log_dir="runs/cartpole_dqn",
    ),
)
experiment.run()

Core Components

Learners

DQN trains a Q-network from (rewards, dones, states, actions, next_states). Its config supports target networks, double Q-learning, hard or soft target updates, gradient clipping, discounting, and custom regularizers.

from drlab.learners import DQN, DQNConfig

ActorCritic trains a policy/value network from transition batches with returns. Its config supports TD or return-based value targets, bootstrapped advantages, PPO-style clipping, entropy regularization, advantage normalization, and custom regularizers.

from drlab.learners import ActorCritic, ActorCriticConfig

Controllers

Controllers wrap a PyTorch model and expose:

action = controller.choose(obs)
probs = controller.probabilities(obs)

Available controllers:

  • GreedyController: selects the highest-scoring action.
  • EpsilonGreedyController: wraps another controller and adds annealed random exploration.
  • StochasticController: samples actions from softmax probabilities.

Runner

Runner steps through a Gymnasium environment with a controller and returns:

batch, ep_returns, ep_lengths, last_episode = runner.run(num_steps)

num_steps <= 0 collects one complete episode. Positive values collect up to that many transitions. The returned batch is a TransitionBatch.

Replay

TransitionBatch stores tensors for:

  • states
  • actions
  • rewards
  • dones
  • next_states
  • returns

It provides .to(device) and .cat(other) helpers.

ReplayBuffer stores fixed-capacity NumPy arrays and returns sampled or full data as TransitionBatch instances:

buffer = ReplayBuffer(capacity=10_000, obs_shape=env.observation_space.shape)
batch = buffer.sample(128)
all_data = buffer.get_all()

Experiments

Experiment wrappers combine an environment, controller, learner, runner, replay buffer behavior, progress bar, and TensorBoard logging.

from drlab.experiments import (
    ActorCriticExperiment,
    ActorCriticExperimentConfig,
    DQNExperiment,
    DQNExperimentConfig,
)

Use DQNExperiment for off-policy DQN training and ActorCriticExperiment for on-policy actor-critic training.

Development

Install development dependencies:

python -m pip install -e ".[dev]"

Run the test suite:

python -m unittest discover -v

Build a wheel:

python -m build --wheel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drlab-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drlab-0.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file drlab-0.1.0.tar.gz.

File metadata

  • Download URL: drlab-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for drlab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 487a468cc42256792578a3637aa7cddd0c1a8f35b16d11694d361eba95db9289
MD5 feefbc11b96288223fcb00e4466d8227
BLAKE2b-256 4d0c23f380f91ad1e7219da4a7106fb61ebdcb8f263e20c0c3427f3da265d6a6

See more details on using hashes here.

File details

Details for the file drlab-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: drlab-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for drlab-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da2a1b72c38c61e07e699d536409807b0304e8d334e95040ea4bcf0c7152e12e
MD5 fb4e627effa8a877ff9c6bd38b2f56d5
BLAKE2b-256 737da4c5c762a2ad17677b24ac9456b23eb7a7b18080b983f259f205a91ef89c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page