Memory Maze is an environment to benchmark memory abilities of RL agents
Project description
memory-maze
Memory Maze environment for RL based on dm_control.
Task
Memory Maze is a task designed to test the memory abilities of RL agents.
The task is based on a game known as Scavenger Hunt (or Treasure Hunt). The agent starts in a randomly generated maze, which contains a number of landmarks of different colors. Agent is prompted to find the target landmark of a specific color, indicated by the border color in the observation image. Once the agent successfully finds and touches the correct landmark, it gets a +1 reward and the next random landmark is chosen as a target. If the agent touches the landmark of the wrong color, there is no effect. Throughout the episode the maze layout and the locations of the landmarks do not change. The episode continues for a fixed amount of time, and so the total episode reward is equal to the number of targets the agent can find in the given time.
Memory Maze tests the memory of the agent in a clean and direct way, because an agent with perfect memory will only have to explore the maze once (which is possible in a time much shorter than the length of episode) and then just follow the shortest path to the target, whereas an agent with no memory will have to randomly wonder through the maze to find each target.
There are 4 size variations of the maze. The largest maze 15x15 is designed to be challenging but solvable for humans (see benchmark results below), but out of reach for the state-of-the-art RL methods. The smaller sizes are provided as stepping stones, with 9x9 solvable with current RL methods.
Size | Landmarks | Episode steps | env_id |
---|---|---|---|
9x9 | 3 | 1000 | MemoryMaze-9x9-v0 |
11x11 | 4 | 2000 | MemoryMaze-11x11-v0 |
13x13 | 5 | 3000 | MemoryMaze-13x13-v0 |
15x15 | 6 | 4000 | MemoryMaze-15x15-v0 |
Note that the mazes are generated with labmaze, the same algorithm as used by DmLab-30. In particular, 9x9 corresponds to the small variant and 15x15 corresponds to the large variant.
Examples of generated mazes for 4 different sizes.
Installation
The environment is available as a pip package
pip install git+https://github.com/jurgisp/memory-maze.git#egg=memory-maze
It will automatically install dm_control
and mujoco
dependencies.
Gym interface
Once pip package is installed, the environment can be created using Gym interface
!pip install gym
import gym
env = gym.make('memory_maze:MemoryMaze-9x9-v0')
env = gym.make('memory_maze:MemoryMaze-11x11-v0')
env = gym.make('memory_maze:MemoryMaze-13x13-v0')
env = gym.make('memory_maze:MemoryMaze-15x15-v0')
This default environment has dictionary observation space (TODO: map, targets)
>>> env.observation_space
Dict(image: Box(0, 255, (64, 64, 3), uint8))
In order to make an environment with pure image observation, which may be expected by default RL implementations, add the -Img-v0
suffix to the env id:
env = gym.make('memory_maze:MemoryMaze-9x9-Img-v0')
There are other helper variations of the environment, see here.
dm_env interface
We also provide dm_env API implementation:
from memory_maze import tasks
env = tasks.memory_maze_9x9()
env = tasks.memory_maze_11x11()
env = tasks.memory_maze_13x13()
env = tasks.memory_maze_15x15()
The observation is a dictionary, which includes image observation (TODO: map, targets)
>>> env.observation_spec()
{
'image': BoundedArray(shape=(64, 64, 3), ...)
}
The constructor accepts a number of arguments, which can be used to tweak the environment for debugging:
env = tasks.memory_maze_9x9(
control_freq=4,
discrete_actions=True,
target_color_in_image=True,
image_only_obs=False,
top_camera=False,
good_visibility=False,
camera_resolution=64
)
GUI
There is also a graphical UI provided, which can be launched as:
pip install gym pygame pillow imageio
# The default view, that the agent sees
python gui/run_gui.py --fps=6 --env "memory_maze:MemoryMaze-15x15-v0"
# Higher resolution and higher control frequency, nicer for human control
python gui/run_gui.py --fps=60 --env "memory_maze:MemoryMaze-15x15-HiFreq-HD-v0"
Observation space, Action space
Benchmarks
Oracle scores
Human scores
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for memory_maze-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | befd8c68a95da41fee406f29fbc6f74ea64ef621d2141037be1ebdce064d5016 |
|
MD5 | 1c0637818afb53df2d5059e25c2a2901 |
|
BLAKE2b-256 | d7660a0b68d4160d3a8c00eb0decfae9fa5d402774de094fca52d2e29b7ea9b2 |