torchbringer

A PyTorch library for deep reinforcement learning

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- GPU :: NVIDIA CUDA
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python

Project description

TorchBringer is an open-source framework that provides a simple interface for operating with pre-implemented deep reinforcement learning algorithms built on top of PyTorch. The interfaces provided can be used to operate deep RL agents either locally or remotely via gRPC. Currently, TorchBringer supports the following algorithms

Quickstart

To install TorchBringer, run

pip install --upgrade pip
pip install torchbringer

Local

Here's a simple project for running a TorchBringer agent on gymnasium's Cartpole environment.

import gymnasium as gym
from itertools import count
import torch
from torchbringer.servers.torchbringer_agent import TorchBringerAgent

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

env = gym.make("CartPole-v1")
state, info = env.reset()

config = {
    # Check the reference section to understand config formatting
}

dqn = TorchBringerAgent()
dqn.initialize(config)
steps_done = 0

num_episodes = 600
for i_episode in range(num_episodes):
    state, info = env.reset()
    reward = torch.tensor([0.0], device=device)
    terminal = False
    
    state = torch.tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
    for t in count():
        observation, reward, terminated, truncated, _ = env.step(dqn.step(state, reward, terminal).item())
        state = None if terminated else torch.tensor(observation, dtype=torch.float32, device=device).unsqueeze(0) 
        reward = torch.tensor([reward], device=device)
        terminal = terminated or truncated

        if terminal:
            dqn.step(state, reward, terminal)
            break

Server

To start a TorchBringer server on a particular port, run

python -m torchbringer.servers.grpc.torchbringer_grpc_server <PORT>

You can communicate with this server by using the provided Python client (see below) or develop a client of your own from the files found in torchbringer/servers/grpc in this repo to communicate with the server from applications built with different programming languages.

from torchbringer.servers.grpc.torchbringer_grpc_client import TorchBringerGRPCAgentClient

Reference

cartpole_local_dqn.py provides a simple example of TorchBringer being used on gymnasium's CartPole-v1 envinronment. cartpole_grpc_dqn.py provides an example of how to use the gRPC interface to learn remotely.

The main class that is used in this framework is TorchBringerAgent, implemented in servers/. The gRPC server has an interface very similar to it.

TorchBringerAgent

Method	Parameters	Explanation
initialize()	config: dict	Initializes the agent according to the config. Read the config section for information on formatting
step()	state: Tensor, reward: Tensor, terminal: bool	Performs an optimization step and returns the selected action for this

gRPC interface

Note that there is a client implemented in servers/torchbringer_grpc_client.py that has the exact same interface as TorchBringerAgent. This reference is mostly meant for building clients in other programming languages.

Method	Parameters	Explanation
initialize()	config: string	Accepts a serialized config dict
step()	state: Matrix(dimensions list[int], value: list[float]), reward: float, terminal: bool	State should be given as a flattened matrix, action is returned the same way

Config formatting

The config file is a dictionary that specifies the behavior of the agent. The RL implementation is specified by the value of the key "type". It also accepts a variety of other arguments depending on the imeplementation type.

Currently supported implementations are dqn.

The following specify the arguments allowed by each implementation type.

DQN

Argument	Explanation
"action_space": dict	The gym Space that represents the action space of the environment. Read the Space table on `Other specifications`
"gamma": float	Value of gamma
"tau": float	Value of tau
"epsilon": dict	The epsilon. Read the Epsilon table on `Other specifications`
"batch_size": int	Batch size
"grad_clip_value": float	Value to clip gradient. No clipping if not specified
"loss": dict	The loss. Read the Loss section on `Other specifications`
"optimizer": dict	The optimizer. Read the Optimizer section on `Other specifications`
"replay_buffer_size": int	Capacity of the replay buffer
"network": list[dict]	list of layer specs for the neural network. Read the Layers section on `Other specifications`

Other specifications

These are specifications for dictionaries that are used in the specification of learners. They each have an argument "type" and a corresponding class or function. In the case of classes, all of its initializing parameters can be passed as arguments in this dictionary. When specific arguments are expected, they will be made explicit.

Space

Type	Class
discrete	`gym.spaces.Discrete`

Epsilon

You can read components/epsilon.py to see how each of these are implemented

Type	Arguments	Explanation
exp_decrease	"start": float, "end": float, "steps_to_end": int	Decreases the epsilon exponentially over time.

Loss

Type	Function
smooth_l1_loss	`torch.nn.SmoothL1Loss`

Optimizer

Type	Class
adamw	`torch.optim.AdamW`

Layers

Type	Function
linear	`torch.nn.Linear`
relu	`torch.nn.ReLU`

Example config

config = {
    "type": "dqn",
    "action_space": {
        "type": "discrete",
        "n": 2
    },
    "gamma": 0.99,
    "tau": 0.005,
    "epsilon": {
        "type": "exp_decrease",
        "start": 0.9,
        "end": 0.05,
        "steps_to_end": 1000
    },
    "batch_size": 128,
    "grad_clip_value": 100,
    "loss": "smooth_l1_loss",
    "optimizer": {
        "type": "adamw",
        "lr": 1e-4, 
        "amsgrad": True
    },
    "replay_buffer_size": 10000,
    "network": [
        {
            "type": "linear",
            "in_features": int(n_observations),
            "out_features": 128,
        },
        {"type": "relu"},
        {
            "type": "linear",
            "in_features": 128,
            "out_features": 128,
        },
        {"type": "relu"},
        {
            "type": "linear",
            "in_features": 128,
            "out_features": int(n_actions),
        },
    ]
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- GPU :: NVIDIA CUDA
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python

Release history Release notifications | RSS feed

0.5.6

Sep 5, 2024

0.5.5

Sep 5, 2024

0.5.4

Sep 5, 2024

0.5.3

Sep 5, 2024

0.5.2

Sep 5, 2024

0.5.1

Sep 5, 2024

0.5.0

Jul 29, 2024

0.4.0

Jul 16, 2024

0.3.6

Jun 26, 2024

0.3.5

Jun 22, 2024

0.3.4

Jun 13, 2024

0.3.3

Jun 11, 2024

0.3.2

Jun 11, 2024

0.3.1

Jun 11, 2024

0.3.0

Jun 5, 2024

0.2.7

Jun 5, 2024

0.2.6

May 27, 2024

0.2.5

May 27, 2024

0.2.4

May 27, 2024

This version

0.2.3

May 27, 2024

0.2.2

May 27, 2024

0.2.1

May 27, 2024

0.2.0

May 27, 2024

0.1.2

May 26, 2024

0.1.1

May 26, 2024

0.1.0

May 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchbringer-0.2.3.tar.gz (15.3 kB view hashes)

Uploaded May 27, 2024 Source

Built Distribution

torchbringer-0.2.3-py2.py3-none-any.whl (437.7 kB view hashes)

Uploaded May 27, 2024 Python 2 Python 3

Hashes for torchbringer-0.2.3.tar.gz

Hashes for torchbringer-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`43c960f41b625be04de33ee4b29c0145a673dced927e664a17d662b7a0a5d38f`
MD5	`de41daea5d47c405751d7860da320f07`
BLAKE2b-256	`9fa59567eef73cdd343d1d873a16f12d07fa5dc1dd642604f6cb8852528b7ab5`

Hashes for torchbringer-0.2.3-py2.py3-none-any.whl

Hashes for torchbringer-0.2.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f9796a53fe76810d21c6790beaf99cad562135991b3dbc2fbdf94a149099048`
MD5	`3a1cddaa910ebbc573a258319ca1cd6d`
BLAKE2b-256	`b05d15247347f45310de772d8fc3fe69e817bd17066e5c69f4a9d93f2ae86406`