Framework for fast end-to-end multi-agent reinforcement learning on GPUs.

These details have not been verified by PyPI

Project links

Homepage

Project description

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single or multiple GPUs (Graphics Processing Unit).

Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. WarpDrive also provides the auto scaling tools to achieve the optimal throughput per device (version 1.3) and to perform the distributed asynchronous training among multiple GPU devices (version 1.4). Together, these allow the user to run thousands of concurrent multi-agent simulations and train on extremely large batches of experience, achieving over 100x throughput over CPU-based counterparts.

Our current release includes several multi-agent environments based on the game of "Tag", where taggers are trying to run after and tag the runners. More environments will be added soon!

Below, we show multi-agent RL policies trained for different tagger:runner speed ratios using WarpDrive. These environments can run at millions of steps per second, and train in just a few hours, all on a single GPU!

WarpDrive also provides tools to build and train multi-agent RL systems quickly with just a few lines of code. Here is a short example to train tagger and runner agents:

# Create a wrapped environment object via the EnvWrapper
# Ensure that use_cuda is set to True (in order to run on the GPU)
env_wrapper = EnvWrapper(
    TagContinuous(**run_config["env"]),
    num_envs=run_config["trainer"]["num_envs"], 
    use_cuda=True
)

# Agents can share policy models: this dictionary maps policy model names to agent ids.
policy_tag_to_agent_id_map = {
    "tagger": list(env_wrapper.env.taggers),
    "runner": list(env_wrapper.env.runners),
}

# Create the trainer object
trainer = Trainer(
    env_wrapper=env_wrapper,
    config=run_config,
    policy_tag_to_agent_id_map=policy_tag_to_agent_id_map,
)

# Perform training!
trainer.train()

Below, we compare the training speed on an N1 16-CPU node versus a single A100 GPU (using WarpDrive), for the Tag environment with 100 runners and 5 taggers. With the same environment configuration and training parameters, WarpDrive on a GPU is 5× faster. Both scenarios are with 60 environment replicas running in parallel. Using more environments on the CPU node is infeasible as data copying gets too expensive. With WarpDrive, it is possible to scale up the number of environment replicas at least 10-fold, for even faster training.

Code Structure

WarpDrive provides a CUDA + Python framework and quality-of-life tools, so you can quickly build fast, flexible and massively distributed multi-agent RL systems. The following figure illustrates a bottoms-up overview of the design and components of WarpDrive. The user only needs to write a CUDA step function at the CUDA environment layer, while the rest is a pure Python interface. We have step-by-step tutorials for you to master the workflow.

White Paper and Citing WarpDrive

You can find more details in our white paper: https://arxiv.org/abs/2108.13976.

If you're using WarpDrive in your research or applications, please cite using this BibTeX:

@misc{lan2021warpdrive,
      title={WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU}, 
      author={Tian Lan and Sunil Srinivasa and Huan Wang and Caiming Xiong and Silvio Savarese and Stephan Zheng},
      year={2021},
      eprint={2108.13976},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Tutorials and Quick Start

Familiarize yourself with WarpDrive by running these tutorials on Colab!

A simple end-to-end RL training example: Explains how to get started with multi-agent RL training with just a few lines of code.
WarpDrive basics: Explains the basics of Python APIs in the host managing the CUDA data and kernel functions in the GPU.
WarpDrive sampler: Explains Python APIs controlling the GPU action sampler.
WarpDrive resetter and logger: Explains Python APIs controlling the GPU environment resetter and rollout history logger.
Create custom environments: Explains how to create your own custom RL environment in CUDA C, and integrate it with WarpDrive.
Training with WarpDrive: Explains how to train your environment on the GPU.
Scaling Up training with WarpDrive: Explains how to scale up the training throughput on a single GPU and/or across multiple GPUs.

Note: You may also run these tutorials locally, but you will need a GPU machine with nvcc compiler installed and a compatible Nvidia GPU driver. You will also need Jupyter. See https://jupyter.readthedocs.io/en/latest/install.html for installation instructions

You can find full reference documentation here.

Real-World Problems and Collaborations

AI Economist Covid Environment with WarpDrive: We train two-level multi-agent economic simulations using AI-Economist Foundation and train it using WarpDrive. We specifically consider the COVID-19 and economy simulation in this example.
Pytorch Lightning Trainer with WarpDrive: We provide an example of a multi-agent reinforcement learning training loop with WarpDrive and Pytorch Lightning.

Installation Instructions

To get started, you'll need to have Python 3.7+ and the nvcc compiler installed with a compatible Nvidia GPU CUDA driver.

CUDA (which includes nvcc) can be installed by following Nvidia's instructions here: https://developer.nvidia.com/cuda-downloads.

Docker Image

You can refer to the example Dockerfile for V100 GPU to configure your system. In particular, we suggest you visit Nvidia Docker Hub to download the CUDA and cuDNN images compatible with your system. You should be able to use the command line utility to monitor the NVIDIA GPU devices in your system:

nvidia-smi

and see something like this

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    32W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

In this snapshot, you can see we are using a Tesla V100 GPU and CUDA version 11.0.

Installing using Pip

You can install WarpDrive using the Python package manager:

pip install rl_warp_drive

Installing from Source

Clone this repository to your machine:

git clone www.github.com/salesforce/warp-drive

Optional, but recommended for first tries: Create a new conda environment (named "warp_drive" below) and activate it:
```
conda create --name warp_drive python=3.7 --yes
conda activate warp_drive
```
Install as an editable Python package:
```
cd warp_drive
pip install -e .
```

Testing your Installation

To test your installation, try running from the root directory:

conda activate warp_drive
cd warp_drive/cuda_includes
make compile-test

Running make compile-test will compile the core service source code into a CUDA binary and place it in a bin folder, and additionally, run some unit tests.

Equivalently, you can call directly from Python command

python warp_drive/utils/unittest_run.py

Learn More

For more information, please check out our blog, white paper, and code documentation.

If you're interested in extending this framework, or have questions, join the AI Economist Slack channel using this invite link.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.7.1

Apr 19, 2024

2.6.2

Dec 13, 2023

2.5.0

Jul 30, 2023

2.4

Jun 16, 2023

2.3

Mar 22, 2023

2.2.1

Jan 17, 2023

2.2.0

Jan 4, 2023

2.1.0

Oct 26, 2022

2.0.2

Oct 21, 2022

2.0.1

Sep 22, 2022

2.0

Sep 21, 2022

1.7.0

Sep 8, 2022

1.6.7

Jun 3, 2022

1.6.6 yanked

Jun 3, 2022

1.6.5

Apr 26, 2022

1.6.4

Apr 22, 2022

This version

1.6.3

Apr 21, 2022

1.6.2

Apr 19, 2022

1.6.1

Apr 17, 2022

1.6

Apr 5, 2022

1.5.1

Mar 3, 2022

1.5

Mar 2, 2022

1.4.5

Feb 18, 2022

1.4.4

Feb 8, 2022

1.4.3

Feb 5, 2022

1.4.2

Feb 2, 2022

1.4.1

Feb 2, 2022

1.4

Jan 30, 2022

1.3

Jan 10, 2022

1.2.2

Dec 17, 2021

1.2.1

Dec 8, 2021

1.2

Dec 3, 2021

1.1

Sep 27, 2021

1.0

Sep 1, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl-warp-drive-1.6.3.tar.gz (83.0 kB view hashes)

Uploaded Apr 21, 2022 Source

Built Distribution

rl_warp_drive-1.6.3-py3-none-any.whl (121.0 kB view hashes)

Uploaded Apr 21, 2022 Python 3

Hashes for rl-warp-drive-1.6.3.tar.gz

Hashes for rl-warp-drive-1.6.3.tar.gz
Algorithm	Hash digest
SHA256	`7030edb812bac7c89651cb51f74ab0b522d888894e6d7c1056d8f380143955db`
MD5	`361dc58eab2faf7031265104f604f5d6`
BLAKE2b-256	`17956484b02a1a043cecc976ddb14dd92015e4063d9f0413caeb327a33401e73`

Hashes for rl_warp_drive-1.6.3-py3-none-any.whl

Hashes for rl_warp_drive-1.6.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4bf31906c7b7d4e9671ba33276cee8741c1b781747509c0921464276d1f2a215`
MD5	`6437e3fb360a9b80f7bdb94f498ef895`
BLAKE2b-256	`816262d8ab2937204b43e9ec67a548a62966f6ae77df69aaf98003aae4d30c9c`