Skip to main content

PyTorch implementation of HARL algorithms with the CONCERTO ego-AHT wrapper (HAPPO + Hydra runner + frozen-partner adapter).

Project description

Heterogeneous-Agent Reinforcement Learning — CONCERTO fork

About this fork. This is a CONCERTO fork of PKU-MARL/HARL at upstream commit b1af98b (tagged here as v0.0.0-vendored). The fork adds the ego-AHT (ad-hoc teamwork) wrapper required by CONCERTO — a frozen-partner HAPPO actor, a Hydra-driven runner, and a CONCERTO env adapter — on top of the unmodified upstream tree. The AHT delta is the single commit v0.1.0-aht.

Install (consumer-side). pip install harl-aht pulls the wheel from PyPI; import harl resolves to this fork's tree. The distribution name is harl-aht to disambiguate from unreleased upstream harl and to signal the AHT delta; the import path remains harl for source compatibility with upstream-targeted code.

Licensing. Upstream files retain their MIT licensing (LICENSE). The CONCERTO-added files in the v0.1.0-aht delta are Apache-2.0 licensed (NOTICE + SPDX headers). See ADR-001 in the CONCERTO repo for the fork-vs-build rationale and ADR-002 for the RL-framework decision that motivated the fork.


This repository contains the official implementation of Heterogeneous-Agent Reinforcement Learning (HARL) algorithms, including HAPPO, HATRPO, HAA2C, HADDPG, HATD3, HAD3QN, and HASAC, based on PyTorch. HARL algorithms are markedly different from MAPPO in that they are generally applicable to heterogeneous agents and are supported by rigorous theories, often achieving superior performance. This repository allows researchers and practitioners to easily reproduce our results on seven challenging benchmarks or apply HARL algorithms to their intended applications.

Overview

HARL algorithms are our novel solutions to achieving effective multi-agent cooperation in the general heterogeneous-agent settings, without relying on the restrictive parameter-sharing trick.

Key features

  • HARL algorithms achieve coordinated agent updates by employing the sequential update scheme, different from the simultaneous update scheme utilized by MAPPO and MADDPG.
  • HARL algorithms enjoy theoretical guarantees of monotonic improvement and convergence to equilibrium, ensuring their efficacy in promoting cooperative behavior among agents.
  • Both on-policy and off-policy HARL algorithms, exemplified by HAPPO and HASAC respectively, demonstrate superior performance across a wide range of benchmarks.

The following figure is an illustration of the sequential update scheme

For more details, please refer to our HARL (JMLR 2024) and MEHARL (ICLR 2024 spotlight) papers.

Installation

Install HARL

conda create -n harl python=3.8
conda activate harl
# Install pytorch>=1.9.0 (CUDA>=11.0) manually
git clone https://github.com/PKU-MARL/HARL.git
cd HARL
pip install -e .

Install Environments Dependencies

Along with HARL algorithms, we also implement the interfaces for seven common environments (SMAC, SMACv2, MAMuJoCo, MPE, Google Research Football, Bi-DexterousHands, Light Aircraft Game) and they can be used directly. (We also implement the interface for Gym. Gym is a single-agent environment, which can be seen as a special case of multi-agent environments. It is included mainly for reference purposes.) You may choose to install the dependencies to the environments you want to use. After the installation, please follow the steps in the "Solve Dependencies" section.

Install Dependencies of Bi-DexterousHands

Bi-DexterousHands depend on IsaacGym. The hardware requirements of IsaacGym has to be satisfied. To install IsaacGym, download IsaacGym Preview 4 release from its official website. Then run pip install -e . under its python folder.

Install Light Aircraft Game

Light Aircraft Game (LAG) is a recently developed cooperative-competitive environment for red and blue aircraft games, offering various settings such as single control, 1v1, and 2v2 scenarios. In the context of multi-agent scenarios, LAG currently supports self-play only for 2v2 settings. To address this limitation, we introduce novel cooperative non-weapon and shoot-missile tasks where two agents collaborate to combat two opponents controlled by the built-in AI. In the non-weapon task, agents are trained to fly towards the opponents' tails while maintaining a suitable distance. In the shoot-missile task, agents learn to dodge opponent missiles and launch their own missiles to destroy the opponents.

To install LAG, run the following command:

# Install dependencies
pip install torch pymap3d jsbsim==1.1.6 geographiclib gym==0.21.0 wandb icecream setproctitle
# Initialize submodules(*JSBSim-Team/jsbsim*)
git submodule init
git submodule update

Install Google Research Football

Please follow the official instructions to install Google Research Football.

Install SMAC

Please follow the official instructions to install SMAC. We use StarCraft II version 4.10 on Linux.

Install SMACv2

Please follow the official instructions to install SMACv2.

Install MPE

pip install pettingzoo==1.22.2
pip install supersuit==3.7.0

Install Gym Suite (Except MuJoCo)

# Install gym
pip install gym
# Install classic control
pip install gym[classic_control]
# Install box2d
conda install -c anaconda swig
pip install gym[box2d]
# Install atari
pip install --upgrade pip setuptools wheel
pip install opencv-python
pip install atari-py
pip install gym[atari]
pip install autorom[accept-rom-license]

Install MuJoCo

First, follow the instructions on https://github.com/openai/mujoco-py, https://www.roboti.us/, and https://github.com/deepmind/mujoco to download the right version of mujoco you need.

Second, mkdir ~/.mujoco.

Third, move the .tar.gz or .zip to ~/.mujoco, and extract it using tar -zxvf or unzip.

Fourth, add the following line to the .bashrc:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/<user>/.mujoco/<folder-name, e.g. mujoco210, mujoco-2.2.1>/bin

Fifth, run the following command:

sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
pip install mujoco
pip install gym[mujoco]
sudo apt-get update -y
sudo apt-get install -y patchelf

Install Dependencies of MAMuJoCo

First follow the instructions above to install MuJoCo. Then run the following commands.

pip install "mujoco-py>=2.1.2.14"
pip install "Jinja2>=3.0.3"
pip install "glfw>=2.5.1"
pip install "Cython>=0.29.28"

Note that mujoco-py is compatible with mujoco210 (see this). So please make sure to download mujoco210 and extract it into the right place.

Solve Dependencies

After the installation above, run the following commands to solve dependencies.

pip install gym==0.21.0
pip install pyglet==1.5.0
pip install importlib-metadata==4.13.0

If you encounter issues when using pip install gym==0.21.0, try using the following command instead:

conda install -c conda-forge gym=0.21.0

Usage

Training on Existing Environments

To train an algorithm on a provided environment, users can modify yaml configuration files of the corresponding algorithm and environment under harl/configs/algos_cfgs and harl/configs/envs_cfgs as they wish, go to examples folder, and then start training with a one-liner python train.py --algo <ALGO> --env <ENV> --exp_name <EXPERIMENT NAME> or python train.py --load_config <CONFIG FILE PATH> --exp_name <EXPERIMENT NAME>, where the latter is mostly used when reproducing an experiment. We provide the tuned configurations for algorithms in each environments under tuned_configs folder. Users can reproduce our results by using python train.py --load_config <TUNED CONFIG PATH> --exp_name <EXPERIMENT NAME> and change <TUNED CONFIG PATH> to the absolute path of the tuned config file on their machine.

During training, users receive continuous logging feedback in the terminal.

After training, users can check the log file, tensorboard output, experiment configuration, and saved models under the generated results folder. Moreover, users can also render the trained models by setting use_render: True, model_dir: <path to trained models> in algorithm configuration file (for football users also need to set render: True in the environment configuration file), and use the same training command as above again. For SMAC and SMACv2, rendering comes in the form of video replay automatically saved to the StarCraftII/Replays folder (more details can be found here).

To enable batch running, we allow users to modify yaml configs in the command line. For each training command, users specify the special parameters in the commands with the same names as in the config files. For example, if you want to run HAPPO on SMAC tasks under three random seeds. You can customize the configs and replace train.sh with the following commands:

for seed in $(seq 1 3)
do
	python train.py --algo happo --env smac --exp_name test --seed $seed
done

Applying to New Environments

If you want to apply HA series algorithms to solve new tasks, you need to implement environment interfaces, following the examples of the seven environments provided. A simplest interface may look like:

class Env:
    def __init__(self, args):
        self.env = ...
        self.n_agents = ...
        self.share_observation_space = ...
        self.observation_space = ...
        self.action_space = ...

    def step(self, actions):
        return obs, state, rewards, dones, info, available_actions

    def reset(self):
        return obs, state, available_actions

    def seed(self, seed):
        pass

    def render(self):
        pass

    def close(self):
        self.env.close()

The purpose of interface is to hide environment-specific details and expose a unified interaction protocol, so that other modules could process data uniformly.

You may also want to produce continuous logging output during training. If you intend to use an on-policy algorithm, you need to implement a logger for your environment. The simplest logger can inherit from BaseLogger in harl/common/base_logger.py and implement the get_task_name function. More customised logging requirements can be fulfilled by extending or overriding more functions. We recommend referring to the existing loggers in each environment directory to get to know how it is written. If an off-policy algorithm is used, you can directly customise logging by modifying the off-policy runner code. In the end, please register the logger (if any), add a yaml config file, and add necessary code to examples/train.py, harl/utils/configs_tool.py, and harl/utils/envs_tool.py. Again, following the existing seven environment examples will be convenient.

After these steps, you can apply the algorithms immediately as above.

Application Scope of Algorithms

Continuous action space Discrete action space Multi Discrete action space
HAPPO
HATRPO
HAA2C
HADDPG
HATD3
HAD3QN
HASAC
MAPPO
MADDPG
MATD3

Performance on Cooperative MARL Benchmarks

MPE

MAMuJoCo

HAPPO, HADDPG, and HATD3 outperform MAPPO, MADDPG, and MATD3; HAPPO and HATD3 are the most effective methods for heterogeneous-agent cooperation tasks.

SMAC & SMACv2

HAPPO and HATRPO are comparable to or better than MAPPO and QMIX in SMAC and SMACv2, demonstrating their capability in mostly homogeneous-agent settings.

GRF

HAPPO consistently outperforms MAPPO and QMIX on GRF, and the performance gap increases as the number and heterogeneity of agents increase.

Bi-DexterousHands

HAPPO consistently outperforms MAPPO, and is also better than the single-agent baseline PPO, while also showing less variance.

The experiment results of HASAC can be found at https://sites.google.com/view/meharl

Citation

This repository is affiliated with Peking University and BIGAI. If you find our paper or this repository helpful in your research or project, please consider citing our works using the following BibTeX citation:

@article{JMLR:v25:23-0488,
  author  = {Yifan Zhong and Jakub Grudzien Kuba and Xidong Feng and Siyi Hu and Jiaming Ji and Yaodong Yang},
  title   = {Heterogeneous-Agent Reinforcement Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2024},
  volume  = {25},
  number  = {32},
  pages   = {1--67},
  url     = {http://jmlr.org/papers/v25/23-0488.html}
}
@inproceedings{
liu2024maximum,
title={Maximum Entropy Heterogeneous-Agent Reinforcement Learning},
author={Jiarong Liu and Yifan Zhong and Siyi Hu and Haobo Fu and QIANG FU and Xiaojun Chang and Yaodong Yang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=tmqOhBC4a5}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harl_aht-0.1.0.tar.gz (494.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harl_aht-0.1.0-py3-none-any.whl (629.1 kB view details)

Uploaded Python 3

File details

Details for the file harl_aht-0.1.0.tar.gz.

File metadata

  • Download URL: harl_aht-0.1.0.tar.gz
  • Upload date:
  • Size: 494.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harl_aht-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6c23f0044e616c2ec5ce83df0f3ed4d997dc7ae9adda8f07158b34d6989e9311
MD5 ba5e2416a3a1e7ce17f9aed33dc65139
BLAKE2b-256 c4be832bfa7587418a1c736f767a889f4c8fd8ce08c9a0ed353b4a07ac656d51

See more details on using hashes here.

File details

Details for the file harl_aht-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: harl_aht-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 629.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harl_aht-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f6687355bffa34cc5cf80dbde6a9c6afc22c9aee38b73ad66aadd2bde9c7fea
MD5 e56658dd5c74807c27ecd30081d286ef
BLAKE2b-256 f632cdae24e7160b004e2f71341782e115b87b64969e476adb6dae2529bdb4af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page