A standard API for MORL and a diverse set of reference environments.
Project description
MO-Gymnasium: Multi-Objective Reinforcement Learning Environments
Gymnasium environments for multi-objective reinforcement learning (MORL). The environments follow the standard gymnasium's API, but return vectorized rewards as numpy arrays.
For details on multi-objective MDP's (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.
Install
Via pip:
pip install mo-gymnasium
Alternatively, you can install the newest unreleased version:
git clone https://github.com/Farama-Foundation/MO-Gymnasium
cd MO-Gymnasium
pip install -e .
Usage
import gymnasium as gym
import mo_gymnasium as mo_gym
env = mo_gym.make('minecart-v0') # It follows the original gym's API ...
obs = env.reset()
next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs)) # but vector_reward is a numpy array!
# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))
You can also check more examples in this colab notebook!
MORL-Baselines is a repository containing various implementations of multi-objective reinforcement learning algorithms. It relies on the MO-Gymnasium API and shows various examples of the usage of wrappers and environments.
Environments
Env | Obs/Action spaces | Objectives | Description |
---|---|---|---|
deep-sea-treasure-v0 |
Discrete / Discrete | [treasure, time_penalty] |
Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019. |
resource-gathering-v0 |
Discrete / Discrete | [enemy, gold, gem] |
Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008. |
fishwood-v0 |
Discrete / Discrete | [fish_amount, wood_amount] |
ESR environment, the agent must collect fish and wood to light a fire and eat. From Roijers et al. 2018. |
fruit-tree-v0 |
Discrete / Discrete | [nutri1, ..., nutri6] |
Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From Yang et al. 2019. |
breakable-bottles-v0 |
Discrete (Dictionary) / Discrete | [time_penalty, bottles_delivered, potential] |
Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From Vamplew et al. 2021. |
four-room-v0 |
Discrete / Discrete | [item1, item2, item3] |
Agent must collect three different types of items in the map and reach the goal. From Alegre et al. 2022. |
water-reservoir-v0 |
Continuous / Continuous | [cost_flooding, deficit_water] |
A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From Pianosi et al. 2013. |
mo-mountaincar-v0 |
Continuous / Discrete | [time_penalty, reverse_penalty, forward_penalty] |
Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011. |
mo-MountainCarContinuous-v0 |
Continuous / Continuous | [time_penalty, fuel_consumption_penalty] |
Continuous Mountain Car env, but with penalties for fuel consumption. |
mo-lunar-lander-v2 |
Continuous / Discrete or Continuous | [landed, shaped_reward, main_engine_fuel, side_engine_fuel] |
MO version of the "LunarLander-v2" environment. Objectives defined similarly as in Hung et al. 2022. |
mo-reacher-v0 |
Continuous / Discrete | [target_1, target_2, target_3, target_4] |
Reacher robot from PyBullet, but there are 4 different target positions. From Alegre et al. 2022. |
minecart-v0 |
Continuous or Image / Discrete | [ore1, ore2, fuel] |
Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019. |
mo-highway-v0 and mo-highway-fast-v0 |
Continuous / Discrete | [speed, right_lane, collision] |
The agent's objective is to reach a high speed while avoiding collisions with neighbouring vehicles and staying on the rightest lane. From highway-env. |
mo-supermario-v0 |
Image / Discrete | [x_pos, time, death, coin, enemy] |
Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019. |
mo-halfcheetah-v4 |
Continuous / Continuous | [velocity, energy] |
Multi-objective version of HalfCheetah-v4 env. Similar to Xu et al. 2020. |
mo-hopper-v4 |
Continuous / Continuous | [velocity, height, energy] |
Multi-objective version of Hopper-v4 env. |
Citing
If you use this repository in your work, please cite:
@inproceedings{Alegre+2022bnaic,
author = {Lucas N. Alegre and Florian Felten and El-Ghazali Talbi and Gr{\'e}goire Danoy and Ann Now{\'e} and Ana L. C. Bazzan and Bruno C. da Silva},
title = {{MO-Gym}: A Library of Multi-Objective Reinforcement Learning Environments},
booktitle = {Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022},
year = {2022}
}
Acknowledgments
- The
minecart-v0
env is a refactor of https://github.com/axelabels/DynMORL. - The
deep-sea-treasure-v0
,fruit-tree-v0
andmo-supermario-v0
envs are based on https://github.com/RunzheYang/MORL. - The
four-room-v0
env is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer. - The
fishwood-v0
code was provided by Denis Steckelmacher and Conor F. Hayes. - The
water-reservoir-v0
code was provided by Mathieu Reymond.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mo_gymnasium-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2df5dd5f10f08ba4ab6e02e4a96a8b0c6c931095e5ce7631b7d4b7f00a17a82 |
|
MD5 | 4e5d07bd54a01a5a674dabbad4b711dd |
|
BLAKE2b-256 | 689838d96e703e55f724c7c9b4759b72cc226776710c23d0707ff1f8dceb58aa |