Skip to main content

Deep Reinforcement Learning kit for research.

Project description

drlab

drlab is a small deep reinforcement learning package for research code and experiments. It provides reusable building blocks for Gymnasium environments:

  • DQN and actor-critic learners
  • greedy, epsilon-greedy, and stochastic controllers
  • a transition runner for collecting environment interaction
  • replay buffer and transition batch utilities
  • lightweight experiment wrappers with TensorBoard logging

The package is designed around small, composable pieces: a PyTorch model, controller, runner, learner, and optionally an experiment wrapper.

Installation

From the repository root:

python -m pip install -e .

For experiment and development dependencies:

python -m pip install -e ".[experiments,dev]"

Package Overview

Public classes are available from the package root:

from drlab import (
    ActorCritic,
    ActorCriticConfig,
    ActorCriticExperiment,
    ActorCriticExperimentConfig,
    Controller,
    DQN,
    DQNConfig,
    DQNExperiment,
    DQNExperimentConfig,
    EpsilonGreedyController,
    GreedyController,
    ReplayBuffer,
    Runner,
    StochasticController,
    TransitionBatch,
)

They can also be imported from their subpackages:

Subpackage Exports Purpose
drlab.learners DQN, DQNConfig, ActorCritic, ActorCriticConfig Update PyTorch models from transition batches.
drlab.controllers Controller, GreedyController, EpsilonGreedyController, StochasticController Convert model outputs into environment actions.
drlab.runners Runner Collect transitions from a Gymnasium environment.
drlab.replay ReplayBuffer, TransitionBatch Store, sample, move, and concatenate transitions.
drlab.experiments DQNExperiment, DQNExperimentConfig, ActorCriticExperiment, ActorCriticExperimentConfig Run training loops with logging and progress bars.

Implemented Algorithms

Algorithm Type Implementation Summary
DQN Off-policy value-based RL Trains a Q-network with one-step TD targets from (state, action, reward, done, next_state) batches. It supports replay-buffer training through DQNExperiment, target networks, Double DQN action selection, hard or soft target-network updates, gradient clipping, configurable discounting, and custom regularizers.
Actor-Critic On-policy policy-gradient RL Trains a shared policy/value network from transition batches and returns. The policy head is optimized with advantage-weighted log probabilities, while the value head can use TD targets or full returns. It supports bootstrapped advantages, optional baseline subtraction, advantage normalization, entropy regularization with annealing, gradient clipping, custom regularizers, and PPO-style clipped policy updates for extra optimization passes.

The package also includes reusable action-selection controllers:

  • GreedyController: deterministic argmax action selection from model scores.
  • EpsilonGreedyController: epsilon-greedy exploration with linear annealing.
  • StochasticController: samples actions from softmax probabilities.

Model Output Convention

Controllers and learners expect the model output to use a shared layout:

  • DQN models should output at least num_actions columns. The first num_actions columns are treated as action scores.
  • Actor-critic models should output at least num_actions + 1 columns. The first num_actions columns are policy logits, and the next column is the value estimate.

Quick DQN Example

import gymnasium as gym
import torch as th

from drlab import (
    DQN,
    DQNConfig,
    DQNExperiment,
    DQNExperimentConfig,
    EpsilonGreedyController,
    GreedyController,
)

env = gym.make("CartPole-v1")

model = th.nn.Sequential(
    th.nn.Linear(4, 64),
    th.nn.ReLU(),
    th.nn.Linear(64, 2),
)
optimizer = th.optim.Adam(model.parameters(), lr=1e-3)

learner = DQN(model, optimizer, DQNConfig(num_actions=2))
controller = EpsilonGreedyController(
    GreedyController(model, num_actions=2),
    num_actions=2,
    max_eps=1.0,
    min_eps=0.05,
    anneal_steps=10_000,
)

experiment = DQNExperiment(
    env,
    controller,
    learner,
    DQNExperimentConfig(
        max_steps=20_000,
        run_steps=1,
        batch_size=128,
        log_dir="runs/cartpole_dqn",
    ),
)
experiment.run()

Core Components

Learners

DQN trains a Q-network from (rewards, dones, states, actions, next_states). Its config supports target networks, double Q-learning, hard or soft target updates, gradient clipping, discounting, and custom regularizers.

from drlab.learners import DQN, DQNConfig

ActorCritic trains a policy/value network from transition batches with returns. Its config supports TD or return-based value targets, bootstrapped advantages, PPO-style clipping, entropy regularization, advantage normalization, and custom regularizers.

from drlab.learners import ActorCritic, ActorCriticConfig

Controllers

Controllers wrap a PyTorch model and expose:

action = controller.choose(obs)
probs = controller.probabilities(obs)

Available controllers:

  • GreedyController: selects the highest-scoring action.
  • EpsilonGreedyController: wraps another controller and adds annealed random exploration.
  • StochasticController: samples actions from softmax probabilities.

Runner

Runner steps through a Gymnasium environment with a controller and returns:

batch, ep_returns, ep_lengths, last_episode = runner.run(num_steps)

num_steps <= 0 collects one complete episode. Positive values collect up to that many transitions. The returned batch is a TransitionBatch.

Replay

TransitionBatch stores tensors for:

  • states
  • actions
  • rewards
  • dones
  • next_states
  • returns

It provides .to(device) and .cat(other) helpers.

ReplayBuffer stores fixed-capacity NumPy arrays and returns sampled or full data as TransitionBatch instances:

buffer = ReplayBuffer(capacity=10_000, obs_shape=env.observation_space.shape)
batch = buffer.sample(128)
all_data = buffer.get_all()

Experiments

Experiment wrappers combine an environment, controller, learner, runner, replay buffer behavior, progress bar, and TensorBoard logging.

from drlab.experiments import (
    ActorCriticExperiment,
    ActorCriticExperimentConfig,
    DQNExperiment,
    DQNExperimentConfig,
)

Use DQNExperiment for off-policy DQN training and ActorCriticExperiment for on-policy actor-critic training.

Development

Install development dependencies:

python -m pip install -e ".[dev]"

Run the test suite:

python -m unittest discover -v

Build a wheel:

python -m build --wheel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drlab-0.1.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drlab-0.1.1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file drlab-0.1.1.tar.gz.

File metadata

  • Download URL: drlab-0.1.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for drlab-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e121c30926cd1ff04309d21606a99c60fea2867b0cd627fa974a24ab10e21fe1
MD5 24989444bb20cd1112f146b48ac0e7e1
BLAKE2b-256 d0692686680a7a801a75a05ffdd1a6770cefa21c1c4cdbac009350a61844f8eb

See more details on using hashes here.

File details

Details for the file drlab-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: drlab-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for drlab-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b87ea03835825332baa90c2071d4f72e0293c20b9cba046624d2ff93747e0818
MD5 69e0ed86ef84bc747f73f0905f70c080
BLAKE2b-256 e2befbbac3427f64c65f7e57a0a9b339d0fa96ab3d6750c2d6079ef7eabfa7d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page