Skip to main content

GPU-Accelerable Multi-Objective Playground

Project description

MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

Neil Janwani, Ellen Novoseller, Vernon Lawhern, Maegan Tucker,

https://arxiv.org/abs/2603.09237v1

MO-Playground is a collection of multi-objective environments built in JAX for GPU-Accelerated multi-objective RL.

Note that due to double-blind requirements, moplayground's documentation page and pip-installable package is not yet available.

Prerequisites

The code was tested with:

  • Ubuntu 22.04
  • Python 3.12.12
  • CUDA 13.0 (if you want to train policies. Evaluation can happen without a GPU).

Installation

Using the provided ymls, create a new conda environment. If you want to enable GPU based training, run

conda env create -f environment.yml
conda activate moplayground

if you just want to evaluate policies and explore the code (i.e. running on a Mac)

conda env create -f mac_environment.yml
conda activate moplayground

Finally, go to the project root and run

pip3 install -e .

to install the moplayground package.

Evaluation

Create an account at Weights and Biases. Note that the process should be free and you'll be required to paste your API key to get things to work. Educational accounts receive some amount of free storage by the way, which can be useful if you're a student!

Finally, pick an environment from the below list

  • cheetah
  • hopper
  • walker
  • ant
  • humanoid
  • bruce

and download your desired policy.

python3 -m scripts.download_model --env cheetah  

note that you can supply a desired save directory via --save_dir. The default directory is simply results/wandb-downloads. Finally, you can run the policy via

python3 -m scripts.rollout_policy config_path

where config_path is the config.yaml file where your model was saved. It will be at save_dir/env_name/config.yaml, where save_dir and env_name are defined above.

Training

To train a pre-existing environment, check out the configuration files in config/. These files specify everything from model architecture and MORLAX parameters to reward and environment constants. Choose the config file you want, edit the parameters to your liking, and run

python3 -m scripts.train config_path

where config_path is the path to the config of your choice. If you downloaded a policy in the past, you can also use those configs to run an identical training run on your system.

Creating your own environment

To create a custom environment, check out how the cheeah environment works at src/moplayground/envs/dmcontrol/cheetah.py. You should need to make your child class a member of the MultiObjectiveBase class. You will also need to make a config.yaml file for your environment to specify the training parameters.

Note that support for custom dynamics (i.e. non-mujoco) is coming soon.

Classic Environments

Environment Reward 1 Reward 2
Max Vx Max Vy
Max Energy Max Run
Max Height Max Run
Max Energy Max Run
Max Energy Max Run

BRUCE Robotics Example

MO-Playground is demonstrated for the BRUCE humanoid robot, developed by Westwood Robotics.

The application features seven possible reward functions. Note that we combine base_xyz_tracking and base_quat_tracking to explore a 6-dimensional objective space.

Reward Name Description
gait_tracking Track the reference joint-level trajectory
base_xyz_tracking Track the base position associated with the reference trajectory
base_quat_tracking Track the base orientation associated with the reference trajectory
arm_swinging Maximize the amount of arm-swing
arm_static Minimize the amount of arm-swing
minimize_energy Minimize energy consumption

Examples of Multi-Objective Policies

Policy Result
Balanced Reward Balanced Reward
Max Imitation Max Imitation
Max Arm Swinging Arm Swinging
Max Smoothness Max Smoothness

Citation

@article{janwani2026mo,
  title={MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics},
  author={Janwani, Neil and Novoseller, Ellen and Lawhern, Vernon J and Tucker, Maegan},
  journal={arXiv preprint arXiv:2603.09237},
  year={2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moplayground-0.1.3.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moplayground-0.1.3-py3-none-any.whl (72.8 kB view details)

Uploaded Python 3

File details

Details for the file moplayground-0.1.3.tar.gz.

File metadata

  • Download URL: moplayground-0.1.3.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for moplayground-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ebd4b2255349933cfe8b891cde32988bb0b4036c1d6be9636a0baff49502bff6
MD5 456268b8913512127e1de3a0c6739249
BLAKE2b-256 9533372494a5e62197e223267f7ed5c67eec8bea3b8c51facd9caeda2a6b76b8

See more details on using hashes here.

File details

Details for the file moplayground-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: moplayground-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 72.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for moplayground-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 84faed7e9e53b10c5dbedadccddf6ed8d92cd9bd551a744975b7c096a1cae6ce
MD5 32cc09927c71070de18818072de0bc62
BLAKE2b-256 fad77a2900a901393a25766b55648210e7c5c3b535d92b09cc111d88f9707626

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page