A python package to design and debug RL agents

These details have not been verified by PyPI

Project links

Project description

MDP Playground

A python package to inject low-level dimensions of hardness in RL environments. There are toy environments to design and debug RL agents. And complex environment wrappers for Gym environments (inclduing Atari and Mujoco) to test robustness to these dimensions in complex environments.

Quick Start

Example to inject hardness (reward delay and reward noise) into 2 Gymnasium environments: a toy environment and using the Gymnasium wrapper for an Atari environment:

from mdp_playground.envs import
RLToyEnv, GymEnvWrapper
config = {
’state_space_type’: ’discrete’,
’action_space_size’: 8,
’delay’: 1,
’reward_noise’: 0.25,
}
env = RLToyEnv(**config)
ae = gym.make("QbertNoFrameskip-v4")
env = GymEnvWrapper(ae, **config)

Important Note

We are moving to package management with uv and away from using Ray Rllib, so some experiment / agent running functionality might break. The wrappers and toy environment should still work fine though.

Getting started

There are 4 parts to the package:

Toy Environments: The base toy Environment in mdp_playground/envs/rl_toy_env.py implements the toy environment functionality, including discrete and continuous environments, and is parameterised by a config dict which contains all the information needed to instantiate the required toy MDP. Please see example.py for some simple examples of how to use these. For further details, please refer to the documentation in mdp_playground/envs/rl_toy_env.py.
Complex Environment Wrappers: Similar to the toy environment, this is parameterised by a config dict which contains all the information needed to inject the dimensions into Gym environments (tested with Atari, Mujoco and ProcGen). Please see example.py for some simple examples of how to use these. The generic Gym wrapper (for Atari, ProcGen, etc.) is in mdp_playground/envs/gym_env_wrapper.py and the Mujoco specific wrapper is in mdp_playground/envs/mujoco_env_wrapper.py.
Experiments: Experiments are launched using run_experiments.py. Config files for experiments are located inside the experiments directory. Please read the instructions below for details on how to launch experiments.
Analysis: plot_experiments.ipynb contains code to plot the standard plots from the paper.

Running experiments from the main paper

For reproducing experiments from the main paper, please continue reading.

For general install and usage instructions, please see here.

Installation for running experiments from the main paper

We recommend using conda environments to manage virtual Python environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" discrete toy experiments and 1 for the "newer" continuous and complex experiments from the paper. As mentioned in Appendix section Tuned Hyperparameters in the paper, this is because of issues with Ray, the library that we used for our baseline agents.

Please follow the following commands to install for the discrete toy experiments:

conda create -n py36_toy_rl_disc_toy python=3.6
conda activate py36_toy_rl_disc_toy
cd mdp-playground
pip install -r requirements.txt
pip install -e .[extras_disc]

Please follow the following commands to install for the continuous and complex experiments. IMPORTANT: In case, you do not have MuJoCo, please ignore any mujoco related installation errors below:

conda create -n py36_toy_rl_cont_comp python=3.6
conda activate py36_toy_rl_cont_comp
cd mdp-playground
pip install -r requirements.txt
pip install -e .[extras_cont]
wget 'https://ray-wheels.s3-us-west-2.amazonaws.com/master/8d0c1b5e068853bf748f72b1e60ec99d240932c6/ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl'
pip install ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl[rllib,debug]

We list here how the commands for the experiments from the main paper look like:

# For example, for the discrete toy experiments:
conda activate py36_toy_rl_disc_toy
python run_experiments.py -c experiments/dqn_del.py -e dqn_del

# Image representation experiments:
conda activate py36_toy_rl_disc_toy
python run_experiments.py -c experiments/dqn_image_representations.py -e dqn_image_representations
python run_experiments.py -c experiments/dqn_image_representations_sh_quant.py -e dqn_image_representations_sh_quant

# Continuous toy environments:
conda activate py36_toy_rl_cont_comp
python run_experiments.py -c experiments/ddpg_move_to_a_point_time_unit.py -e ddpg_move_to_a_point_time_unit
python run_experiments.py -c experiments/ddpg_move_to_a_point_irr_dims.py -e ddpg_move_to_a_point_irr_dims
# Varying the action range and time unit together for transition_dynamics_order = 2
python run_experiments.py -c experiments/ddpg_move_to_a_point_p_order_2.py -e ddpg_move_to_a_point_p_order_2

# Complex environments:
# The commands below run all configs serially.
# In case, you want to parallelise on a cluster, please provide the CLI argument -n <config_number> at the end of the given commands. Please refer to the documentation for run_experiments.py for this.
conda activate py36_toy_rl_cont_comp
python run_experiments.py -c experiments/dqn_qbert_del.py -e dqn_qbert_del
python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_halfcheetah_time_unit

# For the bsuite debugging experiment, please run the bsuite sonnet dqn agent on our toy environment while varying reward density. Commit https://github.com/deepmind/bsuite/commit/5116216b62ce0005100a6036fb5397e358652530 from the bsuite repo should work fine.

For plotting, please follow the instructions here.

Installation

For reproducing experiments from the main paper, please see here.

For continued usage of MDP Playground as it is in development, please continue reading.

Production use

We recommend using conda to manage environments. After setup of the environment, you can install MDP Playground in two ways:

Manual

To install MDP Playground manually (this might be the preferred way if you want easy access to the included experiments), clone the repository and run:

# If you just want to use the environments as in example.py:
pip install -e .

# If you want to run experiments using Ray, etc.:
pip install -r requirements.txt
pip install -e .[extras]

From PyPI

Alternatively, MDP Playground can also be installed from PyPI. Just run:

# If you just want to use the environments as in example.py:
pip install mdp_playground

# If you want to run experiments using Ray, etc.:
pip install mdp_playground[extras]

Running experiments

You can run experiments using:

run-mdpp-experiments -c <config_file> -e <exp_name> -n <config_num>

The exp_name is a prefix for the filenames of CSV files where stats for the experiments are recorded. The CSV stats files will be saved to the current directory.
The command line arguments also usually have defaults. Please refer to the documentation inside run_experiments.py for further details on the command line arguments. (Or run it with the -h flag to bring up help.)

The config files for experiments from the paper are in the experiments directory.
The name of the file corresponding to an experiment is formed as: <algorithm_name>_<dimension_names>.py for the toy environments
And as: <algorithm_name>_<env>_<dimension_names>.py for the complex environments
Some sample algorithm_names are: dqn, rainbow, a3c, ddpg, td3 and sac
Some sample dimension_names are: seq_del (for delay and sequence length varied together), p_r_noises (for P and R noises varied together), target_radius (for varying target radius) and time_unit (for varying time unit)
For example, for algorithm DQN when varying dimensions delay and sequence length, the corresponding experiment file is dqn_seq_del.py

The CSV stats files will be saved to the current directory and can be analysed in plot_experiments.ipynb.

Plotting

To plot results from experiments, please be sure that you installed MDP Playground for production use manually (please see here) and then run jupyter-notebook and open plot_experiments.ipynb in Jupyter. There are instructions within each of the cells on how to generate and save plots.

Documentation

The documentation can be found at: https://automl.github.io/mdp-playground/

Toy Environments

Complex Environment Wrappers:

Gym

Mujoco

Please see example.py for some simple examples of how to use all of these.

Citing

If you use MDP Playground in your work, please cite the following paper:

@article{rajan2021mdp,
      title={MDP Playground: A Design and Debug Testbed for Reinforcement Learning},
      author={Raghu Rajan and Jessica Lizeth Borja Diaz and Suresh Guttikonda and Fabio Ferreira and André Biedenkapp and Jan Ole von Hartz and Frank Hutter},
      year={2021},
      eprint={1909.07750},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Apr 2, 2026

0.0.2

Jun 7, 2021

0.0.1

Mar 30, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdp_playground-1.0.0.tar.gz (1.2 MB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mdp_playground-1.0.0-py3-none-any.whl (1.8 MB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file mdp_playground-1.0.0.tar.gz.

File metadata

Download URL: mdp_playground-1.0.0.tar.gz
Upload date: Apr 2, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mdp_playground-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1f2e2cce9a826b0c62b3b6917e2eea713a3714780dcffa30ed63d1e2592adc2a`
MD5	`d358a142c40c5d8282e6953d06fc533c`
BLAKE2b-256	`4c0debb7f0df928552f787ab2ae37fb95c547edcea4661f80156ccb48d8838ad`

See more details on using hashes here.

File details

Details for the file mdp_playground-1.0.0-py3-none-any.whl.

File metadata

Download URL: mdp_playground-1.0.0-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 1.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mdp_playground-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`91ea6d0451d74c5ea2f204e0bbb29598b65fe0a6afd0193a43ad3dc990f3ccc9`
MD5	`a9a0c7f3fd9d4ad1a69c32e7383e5d66`
BLAKE2b-256	`d4a3387b4827bf398c9d03c5a3241b43d2efeffe64aa83c74b0568eda7549473`

See more details on using hashes here.

mdp-playground 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MDP Playground

Quick Start

Important Note

Getting started

Running experiments from the main paper

Installation for running experiments from the main paper

Installation

Production use

Manual

From PyPI

Running experiments

Plotting

Documentation

Citing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes