Super Mario Bros. for OpenAI Gym
Project description
gym-super-mario-bros
An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) using the nes-py emulator.
Installation
The preferred installation of gym-super-mario-bros
is from pip
:
pip install gym-super-mario-bros
Usage
Python
You must import gym_super_mario_bros
before trying to make an environment.
This is because gym environments are registered at runtime. By default,
gym_super_mario_bros
environments use the full NES action space of 256
discrete actions. To contstrain this, gym_super_mario_bros.actions
provides
three actions lists (RIGHT_ONLY
, SIMPLE_MOVEMENT
, and COMPLEX_MOVEMENT
)
for the nes_py.wrappers.BinarySpaceToDiscreteSpaceEnv
wrapper. See
gym_super_mario_bros/actions.py for a
breakdown of the legal actions in each of these three lists.
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = BinarySpaceToDiscreteSpaceEnv(env, SIMPLE_MOVEMENT)
done = True
for step in range(5000):
if done:
state = env.reset()
state, reward, done, info = env.step(env.action_space.sample())
env.render()
env.close()
NOTE: gym_super_mario_bros.make
is just an alias to gym.make
for
convenience.
NOTE: remove calls to render
in training code for a nontrivial
speedup.
Command Line
gym_super_mario_bros
feature a command line interface for playing
environments using either the keyboard, or uniform random movement.
gym_super_mario_bros -e <the environment ID to play> -m <`human` or `random`>
NOTE: by default, -e
is set to SuperMarioBros-v0
and -m
is set to
human
.
Environments
These environments allow 3 attempts (lives) to make it through the 32 levels of the game. The environments only send reward-able game-play frames to agents; No cut-scenes, loading screens, etc. are sent from the NES emulator to an agent nor can an agent perform actions during these occurrences. If a cut-scene is not able to be skipped by hacking the NES's RAM, the environment will lock the Python process until the emulator is ready for the next action.
Environment | Game | Frameskip | ROM | Screenshot |
---|---|---|---|---|
SuperMarioBros-v0 |
SMB | 4 | standard | |
SuperMarioBros-v1 |
SMB | 4 | downsample | |
SuperMarioBros-v2 |
SMB | 4 | pixel | |
SuperMarioBros-v3 |
SMB | 4 | rectangle | |
SuperMarioBrosNoFrameskip-v0 |
SMB | 1 | standard | |
SuperMarioBrosNoFrameskip-v1 |
SMB | 1 | downsample | |
SuperMarioBrosNoFrameskip-v2 |
SMB | 1 | pixel | |
SuperMarioBrosNoFrameskip-v3 |
SMB | 1 | rectangle | |
SuperMarioBros2-v0 |
SMB2 | 4 | standard | |
SuperMarioBros2-v1 |
SMB2 | 4 | downsample | |
SuperMarioBros2NoFrameskip-v0 |
SMB2 | 1 | standard | |
SuperMarioBros2NoFrameskip-v1 |
SMB2 | 1 | downsample |
Individual Levels
These environments allow a single attempt (life) to make it through a single level of the game.
Use the template
SuperMarioBros-<world>-<level>-v<version>
where:
<world>
is a number in {1, 2, 3, 4, 5, 6, 7, 8} indicating the world<level>
is a number in {1, 2, 3, 4} indicating the level within a world<version>
is a number in {0, 1, 2, 3} specifying the ROM mode to use- 0: standard ROM
- 1: downsampled ROM
- 2: pixel ROM
- 3: rectangle ROM
NoFrameskip
can be added before the first hyphen to disable frame skip
For example, to play 4-2 on the downsampled ROM, you would use the environment
id SuperMarioBros-4-2-v1
. To disable frame skip you would use
SuperMarioBrosNoFrameskip-4-2-v1
.
Step
Info about the rewards and info returned by the step
method.
Reward Function
The reward function assumes the objective of the game is to move as far right as possible (increase the agent's x value), as fast as possible, without dying. To model this game, three separate variables compose the reward:
- v: the difference in agent x values between states
- in this case this is instantaneous velocity for the given step
- moving right ⇔ v > 0
- moving left ⇔ v < 0
- no movement ⇔ v = 0
- c: the difference in the game clock between frames
- this penalty encourages the agent to move quickly
- no clock change ⇔ c = 0
- clock decrease ⇔ c < 0
- d: a death penalty that penalizes the agent for dying in a state
- this penalty encourages the agent to avoid death
- no clock change ⇔ d = 0
- death ⇔ d = -15
r = v + c + d
The reward is clipped into the range (-15, 15).
info
dictionary
The info
dictionary returned by the step
method contains the following
keys:
Key | Type | Description |
---|---|---|
flag_get |
bool |
True if Mario reached a flag or ax |
Citation
Please cite gym-super-mario-bros
if you use it in your research.
@misc{gym-super-mario-bros,
author = {Christian Kauten},
title = {{S}uper {M}ario {B}ros for {O}pen{AI} {G}ym},
year = {2018},
publisher = {GitHub},
howpublished = {\url{https://github.com/Kautenja/gym-super-mario-bros}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gym_super_mario_bros-3.1.2.tar.gz
.
File metadata
- Download URL: gym_super_mario_bros-3.1.2.tar.gz
- Upload date:
- Size: 196.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 018c3c920dcd8e6e16f2609c4bc074294524cbffb673ed6cdf4b3474e0c28e16 |
|
MD5 | ec1d2024ca10972471dc2971471d0813 |
|
BLAKE2b-256 | 46103b8c919657367bf0383495ab4a683baf763ca368d34783e5bb3a8a7b34b1 |
File details
Details for the file gym_super_mario_bros-3.1.2-py2.py3-none-any.whl
.
File metadata
- Download URL: gym_super_mario_bros-3.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 194.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff2de04f6c5de4dde81819858a6e00f0a374e95e20466e44225b4ffcdc64f065 |
|
MD5 | c0aeb1c57be1c53b5582fde1fa5ea5f4 |
|
BLAKE2b-256 | 310604e8821a456ba8219d0974730e8e137c7da1a0aa98a120231b81601a718c |