No project description provided
Project description
MARLlib: An Extensive Multi-agent Reinforcement Learning Library
Multi-agent Reinforcement Learning Library (MARLlib) is a MARL library based on Ray and one of its toolkits RLlib. It provides the MARL research community a unified platform for building, training, and evaluating MARL algorithms on almost all diverse tasks and environments.
A simple case of MARLlib usage:
from marllib import marl
# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread")
# initialize algorithm with appointed hyper-parameters
mappo = marl.algos.mappo(hyperparam_source='mpe')
# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "gru", "encode_layer": "128-256"})
# start training
mappo.fit(env, model, stop={'timesteps_total': 1000000}, share_policy='group')
# ready to control
mappo.render(env, model, share_policy='group', restore_path='path_to_checkpoint')
Why MARLlib?
Here we provide a table for the comparison of MARLlib and existing work.
Library | Supported Env | Algorithm | Parameter Sharing | Model |
---|---|---|---|---|
PyMARL | 1 cooperative | 5 | share | GRU |
PyMARL2 | 2 cooperative | 11 | share | MLP + GRU |
MAPPO Benchmark | 4 cooperative | 1 | share + separate | MLP + GRU |
MAlib | 4 self-play | 10 | share + group + separate | MLP + LSTM |
EPyMARL | 4 cooperative | 9 | share + separate | GRU |
MARLlib | 10 no task mode restriction | 18 | share + group + separate + customizable | MLP + CNN + GRU + LSTM |
Library | Github Stars | Documentation | Issues Open | Activity | Last Update |
---|---|---|---|---|---|
PyMARL | :x: | ||||
PyMARL2 | :x: | ||||
MAPPO Benchmark | :x: | ||||
MAlib | |||||
EPyMARL | :x: | ||||
MARLlib |
key features
:beginner: What MARLlib brings to MARL community:
- it unifies diverse algorithm pipelines with agent-level distributed dataflow.
- it supports all task modes: cooperative, collaborative, competitive, and mixed.
- it unifies multi-agent environment interfaces with a new interface following Gym.
- it provides flexible and customizable parameter-sharing strategies.
:rocket: With MARLlib, you can exploit the advantages not limited to:
- zero knowledge of MARL: out of the box 18 algorithms with intuitive API!
- all task modes available: support almost all multi-agent environment!
- customizable model arch: pick your favorite one from the model zoo!
- customizable policy sharing: grouped by MARLlib or build your own!
- more than a thousand experiments are conducted and released!
Installation
Note: MARLlib supports Linux only.
Step-by-step (recommended)
- install dependencies
- install environments
- install patches
1. install dependencies (basic)
First, install MARLlib dependencies to guarantee basic usage. following this guide, finally install patches for RLlib.
$ conda create -n marllib python=3.8
$ conda activate marllib
$ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib
$ pip install -r requirements.txt
2. install environments (optional)
Please follow this guide.
3. install patches (basic)
Fix bugs of RLlib using patches by running the following command:
$ cd /Path/To/MARLlib/marl/patch
$ python add_patch.py -y
PyPI
$ pip install --upgrade pip
$ pip install marllib
Getting started
Prepare the configuration
There are four parts of configurations that take charge of the whole training process.
- scenario: specify the environment/task settings
- algorithm: choose the hyperparameters of the algorithm
- model: customize the model architecture
- ray/rllib: change the basic training settings
Before training, ensure all the parameters are set correctly, especially those you don't want to change.
Note: You can also modify all the pre-set parameters via MARLLib API.*
Register the environment
Ensure all the dependencies are installed for the environment you are running with. Otherwise, please refer to MARLlib documentation.
task mode | api example |
---|---|
cooperative | marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True) |
collaborative | marl.make_env(environment_name="mpe", map_name="simple_spread") |
competitive | marl.make_env(environment_name="mpe", map_name="simple_adversary") |
mixed | marl.make_env(environment_name="mpe", map_name="simple_crypto") |
Most of the popular environments in MARL research are supported by MARLlib:
Env Name | Learning Mode | Observability | Action Space | Observations |
---|---|---|---|---|
LBF | cooperative + collaborative | Both | Discrete | 1D |
RWARE | cooperative | Partial | Discrete | 1D |
MPE | cooperative + collaborative + mixed | Both | Both | 1D |
SMAC | cooperative | Partial | Discrete | 1D |
MetaDrive | collaborative | Partial | Continuous | 1D |
MAgent | collaborative + mixed | Partial | Discrete | 2D |
Pommerman | collaborative + competitive + mixed | Both | Discrete | 2D |
MAMuJoCo | cooperative | Partial | Continuous | 1D |
GRF | collaborative + mixed | Full | Discrete | 2D |
Hanabi | cooperative | Partial | Discrete | 1D |
Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and important notes.
Initialize the algorithm
running target | api example |
---|---|
train & finetune | marl.algos.mappo(hyperparam_source=$ENV) |
develop & debug | marl.algos.mappo(hyperparam_source="test") |
3rd party env | marl.algos.mappo(hyperparam_source="common") |
Here is a chart describing the characteristics of each algorithm:
algorithm | support task mode | discrete action | continuous action | policy type |
---|---|---|---|---|
IQL* | all four | :heavy_check_mark: | off-policy | |
PG | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
A2C | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
DDPG | all four | :heavy_check_mark: | off-policy | |
TRPO | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
PPO | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
COMA | all four | :heavy_check_mark: | on-policy | |
MADDPG | all four | :heavy_check_mark: | off-policy | |
MAA2C* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
MATRPO* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
MAPPO | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy |
HATRPO | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy |
HAPPO | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy |
VDN | cooperative | :heavy_check_mark: | off-policy | |
QMIX | cooperative | :heavy_check_mark: | off-policy | |
FACMAC | cooperative | :heavy_check_mark: | off-policy | |
VDAC | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy |
VDPPO* | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy |
*all four: cooperative collaborative competitive mixed
IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.
Build the agent model
An agent model consists of two parts, encoder
and core arch
.
encoder
will be constructed by MARLlib according to the observation space.
Choose mlp
, gru
, or lstm
as you like to build the complete model.
model arch | api example |
---|---|
MLP | marl.build_model(env, algo, {"core_arch": "mlp") |
GRU | marl.build_model(env, algo, {"core_arch": "gru"}) |
LSTM | marl.build_model(env, algo, {"core_arch": "lstm"}) |
Encoder Arch | marl.build_model(env, algo, {"core_arch": "gru", "encode_layer": "128-256"}) |
Kick off the training
setting | api example |
---|---|
train | algo.fit(env, model) |
debug | algo.fit(env, model, local_mode=True) |
stop condition | algo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 10000000}) |
policy sharing | algo.fit(env, model, share_policy='all') # or 'group' / 'individual' |
save model | algo.fit(env, model, checkpoint_freq=100, checkpoint_end=True) |
GPU accelerate | algo.fit(env, model, local_mode=False, num_gpus=1) |
CPU accelerate | algo.fit(env, model, local_mode=False, num_workers=5) |
Training & rendering API
from marllib import marl
# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread")
# initialize algorithm with appointed hyper-parameters
mappo = marl.algos.mappo(hyperparam_source="mpe")
# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "mlp", "encode_layer": "128-256"})
# start training
mappo.fit(
env, model,
stop={"timesteps_total": 1000000},
checkpoint_freq=100,
share_policy="group"
)
# rendering
mappo.render(
env, model,
local_mode=True,
restore_path={'params_path': "checkpoint_000010/params.json",
'model_path': "checkpoint_000010/checkpoint-10"}
)
Benchmark results
All results are listed here.
Quick examples
MARLlib provides some practical examples for you to refer to.
- Detailed API usage: show how to use MARLlib api in detail, e.g. cmd + api combined running.
- Policy sharing cutomization: define your group policy-sharing strategy as you like based on current tasks.
- Loading model and rendering: render the environment based on the pre-trained model.
- Incorporating new environment: add your new environment following MARLlib's env-agent interaction interface.
- Incorporating new algorithm: add your new algorithm following MARLlib learning pipeline.
Tutorials
Try MPE + MAPPO examples on Google Colaboratory!
More tutorial documentations are available here.
Community
Channel | Link |
---|---|
Issues | GitHub Issues |
Contributing
We are a small team on multi-agent reinforcement learning, and we will take all the help we can get! If you would like to get involved, here is information on contribution guidelines and how to test the code locally.
You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.