Skip to main content

An simple maze to test dynamic programming and tabular reinforcement learning algorithms

Project description

SimpleMazeMDP:

This repository contains code to provide a simple Maze environment used as example MDP for tabular dynamic programming and reinforcement learning labs.

If you want to do the corresponding labs, you need a Google account. Then you can copy-paste the dynamic programming colab and the reinforcement learning colab.

Documentation

MDPs and mazes

Some code is provided to create mazes, transform them into MDPs and visualize them together with policies or value functions. It is contained into three files: *maze.py, mdp.py and maze_plotter.py. The following sections give an overview of this code.

Content of the maze.py file

A maze is represented as an object of the Maze class. It is defined as a grid of width x height cells, and some of these cells contain a wall.

The build_maze(width, height, walls, hit=False) function is used to create a Maze, where walls is a list of the number of the cells which contain a wall. The hit parameter has an impact on the MDP reward function: if hit is true, the agent is penalized each time it tries to move to a wall cell. Otherwise, the agent is just rewarded when it reaches terminal states. In the provided function, the list of terminal states is a singleton corresponding to the last cell that the agent can visit.

Apart from representing the two reward functions described above, the Maze class contains a constructor whose only role is to create the MDP corresponding to the maze and the maze plotter used to display simulations. A key point is that only cells where there is no wall are considered as states of the underlying MDP. To facilitate the correspondence between mazes and MDPs, each free cell (i.e. with no wall) knows the number of its corresponding MDP state.

The maze constructors also builds the action space, the initial state distribution, the transition function and the reward function of the MDP. Once all these data structures have been created, the resulting MDP is built.

A build_maze() and a create_random_maze() functions are provided to create mazes in the lab notebooks or python files.

Content of the mdp.py file

The mdp.py file contains the SimpleActionSpace class and the Mdp class.

The SimpleActionSpace class contains the list of actions and a method to sample from this list. In our maze environment, the possible actions for the agent are going north, south, east or west (resp. [0, 1, 2, 3]).

The Mdp class is designed to be compatible with the OpenAI gym interface (https://gym.openai.com/). The main methods are reset(self, uniform=False), which resets the MDP into an initial state drawn from the initial state distribution, and step(self, u, deviation=0) which is used to let the agent perform a step in the environment, sending and action and receiving the next state, the reward, and a signal telling whether a terminal state was reached. The function render(self) provides a visual rendering of the current state of the simulation.

Content of the maze_plotter.py file

The code to display the effect of the algorithms in these environments is in maze_plotter.py, in the MazePlotter class. In order to visualize the environment, you use the new_render() function to initialize the rendering, then render(V, policy, agent_pos) to refresh the maze with either the newly calculated state values and the policy, or the state-action values, and eventually the current position of the agent. There is also a render_pi(policy) function which only displays the policy (useful for policy iteration). The function save_fig(title) is used to save the last render into a file.

You can see examples of calls to these different visualizations in the functions defined in dynamic programming and reinforcement learning notebooks or python files.

Toolbox

The toolbox.py file provide a few useful functions such as egreedy(), egreedy_loc() and softmax() which are used to perform exploration in reinforcement learning algorithms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mazemdp-1.2.13.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mazemdp-1.2.13-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file mazemdp-1.2.13.tar.gz.

File metadata

  • Download URL: mazemdp-1.2.13.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mazemdp-1.2.13.tar.gz
Algorithm Hash digest
SHA256 503d23a41c5b575541703f0f56ddc36f47603b5767507254b1e6424380fdf408
MD5 03897b81e485a35e4bcda0e26bc87b6a
BLAKE2b-256 781b3f7ceeb4ec157b933c501982e8a780777b7d85b11fb6a8a8e5a3c358792a

See more details on using hashes here.

File details

Details for the file mazemdp-1.2.13-py3-none-any.whl.

File metadata

  • Download URL: mazemdp-1.2.13-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mazemdp-1.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 4d4b7ba2a810834353e5e421aa1817ae9da095a3789d3a82f781bf38d6744a95
MD5 3fc04bfa901c91438e09282bd997b32e
BLAKE2b-256 1865122e8c1af6c085f6999137af80c44781770ab477ffaecdc1199e7a0b1828

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page