Deep Reinforcement Learning kit for research.

These details have not been verified by PyPI

Project description

drlab

drlab is a small deep reinforcement learning package for research code and experiments. It provides reusable building blocks for Gymnasium environments:

DQN and actor-critic learners
greedy, epsilon-greedy, and stochastic controllers
a transition runner for collecting environment interaction
replay buffer and transition batch utilities
lightweight experiment wrappers with TensorBoard logging

The package is designed around small, composable pieces: a PyTorch model, controller, runner, learner, and optionally an experiment wrapper.

Installation

From the repository root:

python -m pip install -e .

For experiment and development dependencies:

python -m pip install -e ".[experiments,dev]"

Package Overview

Public classes are available from the package root:

from drlab import (
    ActorCritic,
    ActorCriticConfig,
    ActorCriticExperiment,
    ActorCriticExperimentConfig,
    Controller,
    DQN,
    DQNConfig,
    DQNExperiment,
    DQNExperimentConfig,
    EpsilonGreedyController,
    GreedyController,
    ReplayBuffer,
    Runner,
    StochasticController,
    TransitionBatch,
)

They can also be imported from their subpackages:

Subpackage	Exports	Purpose
`drlab.learners`	`DQN`, `DQNConfig`, `ActorCritic`, `ActorCriticConfig`	Update PyTorch models from transition batches.
`drlab.controllers`	`Controller`, `GreedyController`, `EpsilonGreedyController`, `StochasticController`	Convert model outputs into environment actions.
`drlab.runners`	`Runner`	Collect transitions from a Gymnasium environment.
`drlab.replay`	`ReplayBuffer`, `TransitionBatch`	Store, sample, move, and concatenate transitions.
`drlab.experiments`	`DQNExperiment`, `DQNExperimentConfig`, `ActorCriticExperiment`, `ActorCriticExperimentConfig`	Run training loops with logging and progress bars.

Implemented Algorithms

Algorithm	Type	Implementation Summary
DQN	Off-policy value-based RL	Trains a Q-network with one-step TD targets from `(state, action, reward, done, next_state)` batches. It supports replay-buffer training through `DQNExperiment`, target networks, Double DQN action selection, hard or soft target-network updates, gradient clipping, configurable discounting, and custom regularizers.
Actor-Critic	On-policy policy-gradient RL	Trains a shared policy/value network from transition batches and returns. The policy head is optimized with advantage-weighted log probabilities, while the value head can use TD targets or full returns. It supports bootstrapped advantages, optional baseline subtraction, advantage normalization, entropy regularization with annealing, gradient clipping, custom regularizers, and PPO-style clipped policy updates for extra optimization passes.

The package also includes reusable action-selection controllers:

GreedyController: deterministic argmax action selection from model scores.
EpsilonGreedyController: epsilon-greedy exploration with linear annealing.
StochasticController: samples actions from softmax probabilities.

Model Output Convention

Controllers and learners expect the model output to use a shared layout:

DQN models should output at least num_actions columns. The first num_actions columns are treated as action scores.
Actor-critic models should output at least num_actions + 1 columns. The first num_actions columns are policy logits, and the next column is the value estimate.

Quick DQN Example

import gymnasium as gym
import torch as th

from drlab import (
    DQN,
    DQNConfig,
    DQNExperiment,
    DQNExperimentConfig,
    EpsilonGreedyController,
    GreedyController,
)

env = gym.make("CartPole-v1")

model = th.nn.Sequential(
    th.nn.Linear(4, 64),
    th.nn.ReLU(),
    th.nn.Linear(64, 2),
)
optimizer = th.optim.Adam(model.parameters(), lr=1e-3)

learner = DQN(model, optimizer, DQNConfig(num_actions=2))
controller = EpsilonGreedyController(
    GreedyController(model, num_actions=2),
    num_actions=2,
    max_eps=1.0,
    min_eps=0.05,
    anneal_steps=10_000,
)

experiment = DQNExperiment(
    env,
    controller,
    learner,
    DQNExperimentConfig(
        max_steps=20_000,
        run_steps=1,
        batch_size=128,
        log_dir="runs/cartpole_dqn",
    ),
)
experiment.run()

Core Components

Learners

DQN trains a Q-network from (rewards, dones, states, actions, next_states). Its config supports target networks, double Q-learning, hard or soft target updates, gradient clipping, discounting, and custom regularizers.

from drlab.learners import DQN, DQNConfig

ActorCritic trains a policy/value network from transition batches with returns. Its config supports TD or return-based value targets, bootstrapped advantages, PPO-style clipping, entropy regularization, advantage normalization, and custom regularizers.

from drlab.learners import ActorCritic, ActorCriticConfig

Controllers

Controllers wrap a PyTorch model and expose:

action = controller.choose(obs)
probs = controller.probabilities(obs)

Available controllers:

GreedyController: selects the highest-scoring action.
EpsilonGreedyController: wraps another controller and adds annealed random exploration.
StochasticController: samples actions from softmax probabilities.

Runner

Runner steps through a Gymnasium environment with a controller and returns:

batch, ep_returns, ep_lengths, last_episode = runner.run(num_steps)

num_steps <= 0 collects one complete episode. Positive values collect up to that many transitions. The returned batch is a TransitionBatch.

Replay

TransitionBatch stores tensors for:

states
actions
rewards
dones
next_states
returns

It provides .to(device) and .cat(other) helpers.

ReplayBuffer stores fixed-capacity NumPy arrays and returns sampled or full data as TransitionBatch instances:

buffer = ReplayBuffer(capacity=10_000, obs_shape=env.observation_space.shape)
batch = buffer.sample(128)
all_data = buffer.get_all()

Experiments

Experiment wrappers combine an environment, controller, learner, runner, replay buffer behavior, progress bar, and TensorBoard logging.

from drlab.experiments import (
    ActorCriticExperiment,
    ActorCriticExperimentConfig,
    DQNExperiment,
    DQNExperimentConfig,
)

Use DQNExperiment for off-policy DQN training and ActorCriticExperiment for on-policy actor-critic training.

Development

Install development dependencies:

python -m pip install -e ".[dev]"

Run the test suite:

python -m unittest discover -v

Build a wheel:

python -m build --wheel

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 30, 2026

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drlab-0.1.1.tar.gz (18.8 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

drlab-0.1.1-py3-none-any.whl (18.4 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file drlab-0.1.1.tar.gz.

File metadata

Download URL: drlab-0.1.1.tar.gz
Upload date: May 30, 2026
Size: 18.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for drlab-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e121c30926cd1ff04309d21606a99c60fea2867b0cd627fa974a24ab10e21fe1`
MD5	`24989444bb20cd1112f146b48ac0e7e1`
BLAKE2b-256	`d0692686680a7a801a75a05ffdd1a6770cefa21c1c4cdbac009350a61844f8eb`

See more details on using hashes here.

File details

Details for the file drlab-0.1.1-py3-none-any.whl.

File metadata

Download URL: drlab-0.1.1-py3-none-any.whl
Upload date: May 30, 2026
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for drlab-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b87ea03835825332baa90c2071d4f72e0293c20b9cba046624d2ff93747e0818`
MD5	`69e0ed86ef84bc747f73f0905f70c080`
BLAKE2b-256	`e2befbbac3427f64c65f7e57a0a9b339d0fa96ab3d6750c2d6079ef7eabfa7d5`

See more details on using hashes here.

drlab 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

drlab

Installation

Package Overview

Implemented Algorithms

Model Output Convention

Quick DQN Example

Core Components

Learners

Controllers

Runner

Replay

Experiments

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes