Skip to main content

Long-Term Evolution Project of Reinforcement Learning

Project description



RLLTE: Long-Term Evolution Project of Reinforcement Learning

Inspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms.

Why RLLTE?

  • 🧬 Long-term evolution for providing latest algorithms and tricks;
  • 🏞️ Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);
  • 🧱 Module-oriented design for complete decoupling of RL algorithms;
  • 🚀 Optimized workflow for full hardware acceleration;
  • ⚙️ Support custom environments and modules;
  • 🖥️ Support multiple computing devices like GPU and NPU;
  • 💾 Large number of reusable benchmarks (RLLTE Hub);
  • 🤖 Large language model-empowered copilot (RLLTE Copilot).

⚠️ Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in Issue#30.

See the project structure below:

For more detailed descriptions of these modules, see API Documentation.

Quick Start

Installation

  • with pip recommended

Open a terminal and install rllte with pip:

conda create -n rllte python=3.8 # create an virtual environment
pip install rllte-core # basic installation
pip install rllte-core[envs] # for pre-defined environments
  • with git

Open a terminal and clone the repository from GitHub with git:

git clone https://github.com/RLE-Foundation/rllte.git
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments

For more detailed installation instruction, see Getting Started.

Fast Training with Built-in Algorithms

RLLTE provides implementations for well-recognized RL algorithms and simple interface for building applications.

On NVIDIA GPU

Suppose we want to use DrQ-v2 to solve a task of DeepMind Control Suite, and it suffices to write a train.py like:

# import `env` and `agent` module
from rllte.env import make_dmc_env 
from rllte.agent import DrQv2

if __name__ == "__main__":
    device = "cuda:0"
    # create env, `eval_env` is optional
    env = make_dmc_env(env_id="cartpole_balance", device=device)
    eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
    # create agent
    agent = DrQv2(env=env, eval_env=eval_env, device=device, tag="drqv2_dmc_pixel")
    # start training
    agent.train(num_train_steps=500000, log_interval=1000)

Run train.py and you will see the following output:

On HUAWEI NPU

Similarly, if we want to train an agent on HUAWEI NPU, it suffices to replace cuda with npu:

device = "cuda:0" -> device = "npu:0"

Three Steps to Create Your RL Agent

Developers only need three steps to implement an RL algorithm with RLLTE. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games.

  • Firstly, select a prototype:

    Click to expand code ``` py from rllte.common.prototype import OnPolicyAgent ```
  • Secondly, select necessary modules to build the agent:

    Click to expand code
    from rllte.xploit.encoder import MnihCnnEncoder
    from rllte.xploit.policy import OnPolicySharedActorCritic
    from rllte.xploit.storage import VanillaRolloutStorage
    from rllte.xplore.distribution import Categorical
    
    • Run the .describe function of the selected policy and you will see the following output:
    OnPolicySharedActorCritic.describe()
    # Output:
    # ================================================================================
    # Name       : OnPolicySharedActorCritic
    # Structure  : self.encoder (shared by actor and critic), self.actor, self.critic
    # Forward    : obs -> self.encoder -> self.actor -> actions
    #            : obs -> self.encoder -> self.critic -> values
    #            : actions -> log_probs
    # Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)
    # ================================================================================
    

    This illustrates the structure of the policy and indicate the optimizable parts.

  • Thirdly, merge these modules and write an .update function:

    Click to expand code
    from torch import nn
    import torch as th
    
    class A2C(OnPolicyAgent):
        def __init__(self, env, tag, seed, device, num_steps) -> None:
            super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)
            # create modules
            encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)
            policy = OnPolicySharedActorCritic(observation_space=env.observation_space,
                                              action_space=env.action_space,
                                              feature_dim=512,
                                              opt_class=th.optim.Adam,
                                              opt_kwargs=dict(lr=2.5e-4, eps=1e-5),
                                              init_fn="xavier_uniform"
                                              )
            storage = VanillaRolloutStorage(observation_space=env.observation_space,
                                            action_space=env.action_space,
                                            device=device,
                                            storage_size=self.num_steps,
                                            num_envs=self.num_envs,
                                            batch_size=256
                                            )
            dist = Categorical()
            # set all the modules
            self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)
        
        def update(self):
            for _ in range(4):
                for batch in self.storage.sample():
                    # evaluate the sampled actions
                    new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)
                    # policy loss part
                    policy_loss = - (batch.adv_targ * new_log_probs).mean()
                    # value loss part
                    value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()
                    # update
                    self.policy.optimizers['opt'].zero_grad(set_to_none=True)
                    (value_loss * 0.5 + policy_loss - entropy * 0.01).backward()
                    nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)
                    self.policy.optimizers['opt'].step()
    
  • Finally, train the agent by

    Click to expand code ``` py from rllte.env import make_atari_env if __name__ == "__main__": device = "cuda" env = make_atari_env("PongNoFrameskip-v4", num_envs=8, seed=0, device=device) agent = A2C(env=env, tag="a2c_atari", seed=0, device=device, num_steps=128) agent.train(num_train_steps=10000000) ```

As shown in this example, only a few dozen lines of code are needed to create RL agents with RLLTE.

Algorithm Decoupling and Module Replacement

RLLTE allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both built-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the .set function:

from rllte.xploit.encoder import EspeholtResidualEncoder
encoder = EspeholtResidualEncoder(...)
agent.set(encoder=encoder)

RLLTE is an extremely open framework that allows developers to try anything. For more detailed tutorials, see Tutorials.

Function List (Part)

RL Agents

Type Algo. Box Dis. M.B. M.D. M.P. NPU 💰 🔭
On-Policy A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy PPO ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy DrAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy DAAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy DrDAAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
On-Policy PPG ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
Off-Policy DQN ✔️ ✔️ ✔️ ✔️
Off-Policy DDPG ✔️ ✔️ ✔️ ✔️
Off-Policy SAC ✔️ ✔️ ✔️ ✔️
Off-Policy SAC-Discrete ✔️ ✔️ ✔️ ✔️
Off-Policy TD3 ✔️ ✔️ ✔️ ✔️
Off-Policy DrQ-v2 ✔️ ✔️ ✔️ ✔️
Distributed IMPALA ✔️ ✔️ ✔️
  • Dis., M.B., M.D.: Discrete, MultiBinary, and MultiDiscrete action space;
  • M.P.: Multi processing;
  • 🐌: Developing;
  • 💰: Support intrinsic reward shaping;
  • 🔭: Support observation augmentation.

Intrinsic Reward Modules

Type Modules
Count-based PseudoCounts, RND, E3B
Curiosity-driven ICM, GIRM, RIDE, Disagreement
Memory-based NGU
Information theory-based RE3, RISE, REVD

See Tutorials: Use Intrinsic Reward and Observation Augmentation for usage examples.

RLLTE Ecosystem

Explore the ecosystem of RLLTE to facilitate your project:

  • Hub: Fast training APIs and reusable benchmarks.
  • Evaluation: Reasonable and reliable metrics for algorithm evaluation.
  • Env: Packaged environments for fast invocation.
  • Deployment: Convenient APIs for model deployment.
  • Pre-training: Methods of pre-training in RL.
  • Copilot: Large language model-empowered copilot.

How To Contribute

Welcome to contribute to this project! Before you begin writing code, please read CONTRIBUTING.md for guide first.

Cite the Project

To cite this project in publications:

@article{yuan2023rllte,
  title={RLLTE: Long-Term Evolution Project of Reinforcement Learning}, 
  author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},
  year={2023},
  journal={arXiv preprint arXiv:2309.16382}
}

Acknowledgment

This project is supported by The Hong Kong Polytechnic University, Eastern Institute for Advanced Study, and FLW-Foundation. EIAS HPC provides a GPU computing platform, and HUAWEI Ascend Community provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See ACKNOWLEDGMENT.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rllte_core-1.0.1.tar.gz (13.9 MB view details)

Uploaded Source

Built Distribution

rllte_core-1.0.1-py3-none-any.whl (727.5 kB view details)

Uploaded Python 3

File details

Details for the file rllte_core-1.0.1.tar.gz.

File metadata

  • Download URL: rllte_core-1.0.1.tar.gz
  • Upload date:
  • Size: 13.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for rllte_core-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e5fc270f7e47d09f301ee154cae735940b4084a054727691eb172ae8e7aa4981
MD5 4ac56433a71b35e871507d59b9449bd2
BLAKE2b-256 b888f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93

See more details on using hashes here.

File details

Details for the file rllte_core-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: rllte_core-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 727.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for rllte_core-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3a0ec99c27eae05def49fee79cf868bca298fbf2f7db9c3b90f9be3fd467d6b3
MD5 00437415ecd0f016e122dfcd19cd52d7
BLAKE2b-256 2d2b77b140abd443a0d222e4e7466949e198e87a161a1f0ff286d998dd2ad000

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page