Game based library for reinforcement learning
Project description
# MazeBase: a sandbox for learning from games
This code is for a simple 2D game environment that can be used in developing reinforcement learning models. It is designed to be compact but flexible, enabling the implementation of diverse set of games. Furthermore, it offers precise tuning of the game difficulty, facilitating the construction of curricula to aid training. The code is in Lua+Torch, and it offers rapid prototyping of games and is easy to connect to models that control the agent’s behavior. For more details, see our [paper](http://arxiv.org/abs/1511.07401).
## Environment
Each game is played in a 2D rectangular grid. Each location in the grid can be empty, or may contain one or more items such as:
- **Block:** an impassible obstacle that does not allow the agent to move to that grid location
- **Water:** the agent may move to a grid location with water, but incurs an additional cost of for doing so.
- **Switch:** a switch can be in one of M states, which we refer to as colors. The agent can toggle through the states cyclically by a toggle action when it is at the location of the switch .
- **Door:** a door has a color, matched to a particular switch. The agent may only move to the door’s grid location if the state of the switch matches the state of the door.
- **PushableBlock:** This block is impassable, but can be moved with a separate “push” actions. The block moves in the direction of the push, and the agent must be located adjacent to the block opposite the direction of the push.
- **Corner:** This item simply marks a corner of the board.
- **Goal:** depending on the game, one or more goals may exist, each named individually.
- **Info:** these items do not have a grid location, but can specify a or give information necessary for its completion.
The environment is presented to the agent as a list of sentences, each describing an item in the game. For example, an agent might see “Block at [-1,4]. Switch at [+3,0] with blue color. Info: change switch to red.” However, note that we use egocentric spatial coordinates, meaning that the environment updates the locations of each object after an action. The environments are generated randomly with some distribution on the various items. For example, we usually specify a uniform distribution over height and width, and a percentage of wall blocks and water blocks.
## Games
Currently, there are 10 different games implemented, but it is possible to add new games. The existing games are:
- **Multigoals:** the agent is given an ordered list of goals as “Info”, and needs to visit the goals in that order.
- **Conditional Goals:** the agent must visit a destination goal that is conditional on the state of a switch. The “Info” is of the form “go to goal 4 if the switch is colored red, else go to goal 2”.
- **Exclusion:** the “Info” in this game specifies a list of goals to avoid. The agent should visit all other unmentioned goals.
- **Switches:** there are multiple switches on the board, and the agent has to toggle all switches to the same color.
- **Light Key:** there is a switch and a door in a wall of blocks. The agent should navigate to a goal which may be on the wrong side of a wall of blocks, in which the agent needs move to and toggle the switch to open the door before going to the goal.
- **Goto:** the agent is given an absolute location on the grid as a target. The game ends when the agent visits this location. Solving this task requires the agent to convert from its own egocentric coordinate representation to absolute coordinates.
- **Goto Hidden:** the agent is given a list of goals with absolute coordinates, and then is told to go to one of the goals by the goal’s name. The agent is not directly given the goal’s location, it must read this from the list of goal locations.
- **Push block:** the agent needs to push a Pushable block so that it lays on top of a switch.
- **Push block cardinal:** the agent needs to push a Pushable block so that it is on a specified edge of the maze, e.g. the left edge. Any location along the edge is acceptable.
- **Blocked door:** the agent should navigate to a goal which may lie on the opposite side of a wall of blocks, as in the Light Key game. However, a PushableBlock blocks the gap in the wall instead of a door.
Examples of each games are shown in this [video](https://youtu.be/kwnp8jFRi5E). The internal parameters of the games are written to a [configuration file](https://github.com/facebook/MazeBase/blob/master/lua/mazebase/config/game_config.lua), which can be easily modified.
## Using Game Environment
First, either install mazebase with `luarocks make *.rockspec` or add the appropriate path:
```lua
package.path = package.path .. ';lua/?/init.lua'
```
To use the game environment as standalone in Torch, first start a local `display` server with
```
$ th -ldisplay.start 8000 0.0.0.0
```
which will begin the remote desktop to view the MazeBase graphics at `http://0.0.0.0:8000`. See the [full repo](https://github.com/szym/display) for more details. Next, include the init file with
```lua
g_mazebase = require('mazebase')
```
Then we have to set which config file to use. Here we are using the config file that used in our [paper](http://arxiv.org/abs/1511.07401)
```lua
g_opts = {games_config_path = 'mazebase/config/game_config.lua'}
```
Next, we call this function to create a dictionary with all necessary words used in the game
```lua
g_mazebase.init_vocab()
```
Finally, initialize the game environment with
```lua
g_mazebase.init_game()
```
Now we can create a new game instance by calling
```lua
g = g_mazebase.new_game()
```
If there are more than one game, it will randomly pick one. Now, the current game state can be retrieved by calling
```lua
s = g:to_sentence()
```
which would return a tensor containing words (encoded by `g_vocab` dictionary) describing each item in the game. If you started the display server, you can see the game at `0.0.0.0:8000` on your browser by doing
```lua
g_disp = require('display')
g_disp.image(g.map:to_image())
```
![sample_output](readme_images/demo_api.png "Example of display")
Next, an action can be performed by calling
```lua
g:act(action)
```
where `action` is the index of the action. The list of possible actions are in `g.agent.action_names`. When there are multiple agents in the game, we can choose the agent to perform the action by doing
```lua
g.agent = g.agents[i]
```
before calling `g:act()`. After the action is completed, `g:update()` must be called so that the game will update its internal state.
Finally, we can check if the game is finished by calling `g:is_active()`. You can run `demo_api.lua` to see the game playing with random actions.
## Creating a new game
Here we demonstrate how a new game can be added. Let us create a very simple game where an agent has to reach the goal. First, we create a file named `SingleGoal.lua`. In it, a game class has to be created
```lua
local SingleGoal, parent = torch.class('SingleGoal', 'MazeBase')
```
Next, we have to construct the game items. In this case, we only need a goal item placed at a random location:
```lua
function SingleGoal:__init(opts, vocab)
parent.__init(self, opts, vocab)
self:add_default_items()
self.goal = self:place_item_rand({type = 'goal'})
end
```
Function `place_item_rand` puts the item on empty random location. But it is possible specify the location using `place_item` function. The argument to this function is a table containing item's properties such as type and name. Here, we only set the type of item to goal, but it is possible to include any number of attributes (e.g. color, name, etc.).
The game rule is to finish when the agent reaches the goal, which can be achieved by changing `update` function to
```lua
function SingleGoal:update()
parent.update(self)
if not self.finished then
if self.goal.loc.y == self.agent.loc.y and self.goal.loc.x == self.agent.loc.x then
self.finished = true
end
end
end
```
This will check if the agent's location is the same as the goal, and sets a flag when it is true. Finally, we have to give a proper reward when the goal is reached:
```lua
function SingleGoal:get_reward()
if self.finished then
return -self.costs.goal -- this will be set in config file
else
return parent.get_reward(self)
end
end
```
Now, we include our game file in `mazebase/init.lua` by adding the following line
```lua
paths.dofile('SingleGoal.lua')
```
Also, the following lines has to be added inside `init_game_opts` function:
```lua
games.SingleGoal = SingleGoal
helpers.SingleGoal = OptsHelper
```
Finally, we need a config file for our new game. Let us create `singlegoal.lua` file in `mazebase/config`. The main parameters of the game is the grid size:
```lua
local mapH = torch.Tensor{5,5,5,10,1}
local mapW = torch.Tensor{5,5,5,10,1}
```
The first two numbers define lower and upper bounds of the parameter. The actual grid size will be uniformly sampled from this range. The remaining three numbers for curriculum training. In the easiest (hardest) case, the upper bound will be set to 3rd (4th) number. 5th number is the step size for changing the upper bound. In the same way, we define a percentage of grid cells to contain a block or water:
```lua
local blockspct = torch.Tensor{0,.05,0,.2,.01}
local waterpct = torch.Tensor{0,.05,0,.2,.01}
```
There are other generic parameters has be set, but see the actual [config file](https://github.com/facebook/MazeBase/blob/master/lua/mazebase/config/singlegoal.lua) for detail. Now we are ready to use the game!
## Training an agent using neural networks
We also provide a code for training different types of neural models with policy gradient method. Training uses CPUs with multi-threading for speed up.
The implemented models are
1. multi-layer neural network
2. convolutional neural network
3. [end-to-end memory network](http://arxiv.org/abs/1503.08895).
For example, running the following command will train a 2-layer network on MultiGoals.
```
th main.lua --hidsz 20 --model mlp --nlayers 2 --epochs 100 --game MultiGoals --nactions 6 --nworker 2
```
To see all the command line options, run
```
th main.lua -h
--hidsz the size of the internal state vector [20]
--nonlin non-linearity type: tanh | relu | none [tanh]
--model model type: mlp | conv | memnn [memnn]
--init_std STD of initial weights [0.2]
--max_attributes maximum number of attributes of each item [6]
--nlayers the number of layers in MLP [2]
--convdim the number of feature maps in convolutional layers [20]
--conv_sz spatial scope of the input to convnet and MLP [19]
--memsize size of the memory in MemNN [20]
--nhop the number of hops in MemNN [3]
--nagents the number of agents [1]
--nactions the number of agent actions [11]
--max_steps force to end the game after this many steps [20]
--games_config_path configuration file for games [mazebase/config/game_config.lua]
--game can specify a single game []
--optim optimization method: rmsprop | sgd [rmsprop]
--lrate learning rate [0.001]
--max_grad_norm gradient clip value [0]
--alpha coefficient of baseline term in the cost function [0.03]
--epochs the number of training epochs [100]
--nbatches the number of mini-batches in one epoch [100]
--batch_size size of mini-batch (the number of parallel games) in each thread [32]
--nworker the number of threads used for training [16]
--beta parameter of RMSProp [0.97]
--eps parameter of RMSProp [1e-06]
--save file name to save the model []
--load file name to load the model []
```
See the [paper](http://arxiv.org/abs/1511.07401) for more details on training.
## Testing a trained model
After training, you can see the model playing by calling function ```test()``` which will display the game in a browser window. But you have to have display package to see the game play. For example, if you saved a trained model using ```--save /tmp/model.t7``` option, then you can load the model using option ```--load /tmp/model.t7 --epochs 0``` and then run ```test()``` command to see it's playing.
## Requirements
The whole code is written in Lua, and requires [Torch7](http://torch.ch/) and [nngraph](http://github.com/torch/nngraph) packages.
The training uses multi-threading for speed up. Display package is necessary for visualizing the game play, which can be installed by
```
luarocks install display
```
This code is for a simple 2D game environment that can be used in developing reinforcement learning models. It is designed to be compact but flexible, enabling the implementation of diverse set of games. Furthermore, it offers precise tuning of the game difficulty, facilitating the construction of curricula to aid training. The code is in Lua+Torch, and it offers rapid prototyping of games and is easy to connect to models that control the agent’s behavior. For more details, see our [paper](http://arxiv.org/abs/1511.07401).
## Environment
Each game is played in a 2D rectangular grid. Each location in the grid can be empty, or may contain one or more items such as:
- **Block:** an impassible obstacle that does not allow the agent to move to that grid location
- **Water:** the agent may move to a grid location with water, but incurs an additional cost of for doing so.
- **Switch:** a switch can be in one of M states, which we refer to as colors. The agent can toggle through the states cyclically by a toggle action when it is at the location of the switch .
- **Door:** a door has a color, matched to a particular switch. The agent may only move to the door’s grid location if the state of the switch matches the state of the door.
- **PushableBlock:** This block is impassable, but can be moved with a separate “push” actions. The block moves in the direction of the push, and the agent must be located adjacent to the block opposite the direction of the push.
- **Corner:** This item simply marks a corner of the board.
- **Goal:** depending on the game, one or more goals may exist, each named individually.
- **Info:** these items do not have a grid location, but can specify a or give information necessary for its completion.
The environment is presented to the agent as a list of sentences, each describing an item in the game. For example, an agent might see “Block at [-1,4]. Switch at [+3,0] with blue color. Info: change switch to red.” However, note that we use egocentric spatial coordinates, meaning that the environment updates the locations of each object after an action. The environments are generated randomly with some distribution on the various items. For example, we usually specify a uniform distribution over height and width, and a percentage of wall blocks and water blocks.
## Games
Currently, there are 10 different games implemented, but it is possible to add new games. The existing games are:
- **Multigoals:** the agent is given an ordered list of goals as “Info”, and needs to visit the goals in that order.
- **Conditional Goals:** the agent must visit a destination goal that is conditional on the state of a switch. The “Info” is of the form “go to goal 4 if the switch is colored red, else go to goal 2”.
- **Exclusion:** the “Info” in this game specifies a list of goals to avoid. The agent should visit all other unmentioned goals.
- **Switches:** there are multiple switches on the board, and the agent has to toggle all switches to the same color.
- **Light Key:** there is a switch and a door in a wall of blocks. The agent should navigate to a goal which may be on the wrong side of a wall of blocks, in which the agent needs move to and toggle the switch to open the door before going to the goal.
- **Goto:** the agent is given an absolute location on the grid as a target. The game ends when the agent visits this location. Solving this task requires the agent to convert from its own egocentric coordinate representation to absolute coordinates.
- **Goto Hidden:** the agent is given a list of goals with absolute coordinates, and then is told to go to one of the goals by the goal’s name. The agent is not directly given the goal’s location, it must read this from the list of goal locations.
- **Push block:** the agent needs to push a Pushable block so that it lays on top of a switch.
- **Push block cardinal:** the agent needs to push a Pushable block so that it is on a specified edge of the maze, e.g. the left edge. Any location along the edge is acceptable.
- **Blocked door:** the agent should navigate to a goal which may lie on the opposite side of a wall of blocks, as in the Light Key game. However, a PushableBlock blocks the gap in the wall instead of a door.
Examples of each games are shown in this [video](https://youtu.be/kwnp8jFRi5E). The internal parameters of the games are written to a [configuration file](https://github.com/facebook/MazeBase/blob/master/lua/mazebase/config/game_config.lua), which can be easily modified.
## Using Game Environment
First, either install mazebase with `luarocks make *.rockspec` or add the appropriate path:
```lua
package.path = package.path .. ';lua/?/init.lua'
```
To use the game environment as standalone in Torch, first start a local `display` server with
```
$ th -ldisplay.start 8000 0.0.0.0
```
which will begin the remote desktop to view the MazeBase graphics at `http://0.0.0.0:8000`. See the [full repo](https://github.com/szym/display) for more details. Next, include the init file with
```lua
g_mazebase = require('mazebase')
```
Then we have to set which config file to use. Here we are using the config file that used in our [paper](http://arxiv.org/abs/1511.07401)
```lua
g_opts = {games_config_path = 'mazebase/config/game_config.lua'}
```
Next, we call this function to create a dictionary with all necessary words used in the game
```lua
g_mazebase.init_vocab()
```
Finally, initialize the game environment with
```lua
g_mazebase.init_game()
```
Now we can create a new game instance by calling
```lua
g = g_mazebase.new_game()
```
If there are more than one game, it will randomly pick one. Now, the current game state can be retrieved by calling
```lua
s = g:to_sentence()
```
which would return a tensor containing words (encoded by `g_vocab` dictionary) describing each item in the game. If you started the display server, you can see the game at `0.0.0.0:8000` on your browser by doing
```lua
g_disp = require('display')
g_disp.image(g.map:to_image())
```
![sample_output](readme_images/demo_api.png "Example of display")
Next, an action can be performed by calling
```lua
g:act(action)
```
where `action` is the index of the action. The list of possible actions are in `g.agent.action_names`. When there are multiple agents in the game, we can choose the agent to perform the action by doing
```lua
g.agent = g.agents[i]
```
before calling `g:act()`. After the action is completed, `g:update()` must be called so that the game will update its internal state.
Finally, we can check if the game is finished by calling `g:is_active()`. You can run `demo_api.lua` to see the game playing with random actions.
## Creating a new game
Here we demonstrate how a new game can be added. Let us create a very simple game where an agent has to reach the goal. First, we create a file named `SingleGoal.lua`. In it, a game class has to be created
```lua
local SingleGoal, parent = torch.class('SingleGoal', 'MazeBase')
```
Next, we have to construct the game items. In this case, we only need a goal item placed at a random location:
```lua
function SingleGoal:__init(opts, vocab)
parent.__init(self, opts, vocab)
self:add_default_items()
self.goal = self:place_item_rand({type = 'goal'})
end
```
Function `place_item_rand` puts the item on empty random location. But it is possible specify the location using `place_item` function. The argument to this function is a table containing item's properties such as type and name. Here, we only set the type of item to goal, but it is possible to include any number of attributes (e.g. color, name, etc.).
The game rule is to finish when the agent reaches the goal, which can be achieved by changing `update` function to
```lua
function SingleGoal:update()
parent.update(self)
if not self.finished then
if self.goal.loc.y == self.agent.loc.y and self.goal.loc.x == self.agent.loc.x then
self.finished = true
end
end
end
```
This will check if the agent's location is the same as the goal, and sets a flag when it is true. Finally, we have to give a proper reward when the goal is reached:
```lua
function SingleGoal:get_reward()
if self.finished then
return -self.costs.goal -- this will be set in config file
else
return parent.get_reward(self)
end
end
```
Now, we include our game file in `mazebase/init.lua` by adding the following line
```lua
paths.dofile('SingleGoal.lua')
```
Also, the following lines has to be added inside `init_game_opts` function:
```lua
games.SingleGoal = SingleGoal
helpers.SingleGoal = OptsHelper
```
Finally, we need a config file for our new game. Let us create `singlegoal.lua` file in `mazebase/config`. The main parameters of the game is the grid size:
```lua
local mapH = torch.Tensor{5,5,5,10,1}
local mapW = torch.Tensor{5,5,5,10,1}
```
The first two numbers define lower and upper bounds of the parameter. The actual grid size will be uniformly sampled from this range. The remaining three numbers for curriculum training. In the easiest (hardest) case, the upper bound will be set to 3rd (4th) number. 5th number is the step size for changing the upper bound. In the same way, we define a percentage of grid cells to contain a block or water:
```lua
local blockspct = torch.Tensor{0,.05,0,.2,.01}
local waterpct = torch.Tensor{0,.05,0,.2,.01}
```
There are other generic parameters has be set, but see the actual [config file](https://github.com/facebook/MazeBase/blob/master/lua/mazebase/config/singlegoal.lua) for detail. Now we are ready to use the game!
## Training an agent using neural networks
We also provide a code for training different types of neural models with policy gradient method. Training uses CPUs with multi-threading for speed up.
The implemented models are
1. multi-layer neural network
2. convolutional neural network
3. [end-to-end memory network](http://arxiv.org/abs/1503.08895).
For example, running the following command will train a 2-layer network on MultiGoals.
```
th main.lua --hidsz 20 --model mlp --nlayers 2 --epochs 100 --game MultiGoals --nactions 6 --nworker 2
```
To see all the command line options, run
```
th main.lua -h
--hidsz the size of the internal state vector [20]
--nonlin non-linearity type: tanh | relu | none [tanh]
--model model type: mlp | conv | memnn [memnn]
--init_std STD of initial weights [0.2]
--max_attributes maximum number of attributes of each item [6]
--nlayers the number of layers in MLP [2]
--convdim the number of feature maps in convolutional layers [20]
--conv_sz spatial scope of the input to convnet and MLP [19]
--memsize size of the memory in MemNN [20]
--nhop the number of hops in MemNN [3]
--nagents the number of agents [1]
--nactions the number of agent actions [11]
--max_steps force to end the game after this many steps [20]
--games_config_path configuration file for games [mazebase/config/game_config.lua]
--game can specify a single game []
--optim optimization method: rmsprop | sgd [rmsprop]
--lrate learning rate [0.001]
--max_grad_norm gradient clip value [0]
--alpha coefficient of baseline term in the cost function [0.03]
--epochs the number of training epochs [100]
--nbatches the number of mini-batches in one epoch [100]
--batch_size size of mini-batch (the number of parallel games) in each thread [32]
--nworker the number of threads used for training [16]
--beta parameter of RMSProp [0.97]
--eps parameter of RMSProp [1e-06]
--save file name to save the model []
--load file name to load the model []
```
See the [paper](http://arxiv.org/abs/1511.07401) for more details on training.
## Testing a trained model
After training, you can see the model playing by calling function ```test()``` which will display the game in a browser window. But you have to have display package to see the game play. For example, if you saved a trained model using ```--save /tmp/model.t7``` option, then you can load the model using option ```--load /tmp/model.t7 --epochs 0``` and then run ```test()``` command to see it's playing.
## Requirements
The whole code is written in Lua, and requires [Torch7](http://torch.ch/) and [nngraph](http://github.com/torch/nngraph) packages.
The training uses multi-threading for speed up. Display package is necessary for visualizing the game play, which can be installed by
```
luarocks install display
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mazebase-0.0.1.tar.gz
(76.5 kB
view details)
File details
Details for the file mazebase-0.0.1.tar.gz
.
File metadata
- Download URL: mazebase-0.0.1.tar.gz
- Upload date:
- Size: 76.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8bdd2a2e2b56819fc0257eac85faed3ea3604b02c62f28aa3c553ea48b30992 |
|
MD5 | 1c588cd3d1188cab9d03619eecc5d2c7 |
|
BLAKE2b-256 | f31da5f2de46ccb2d851359e275af37586ad4f1ce901c9a2ce8887a26959a82b |