Skip to main content

Blazing fast SuperMarioBros-Nes environment for RL research.

Project description

SuperMarioBros-Nes-turbo logo

🚀 Blazing fast SuperMarioBros-Nes environment for RL research 🍄

SuperMarioBros-Nes-turbo is a blazing-fast vectorized Super Mario Bros NES environment for reinforcement-learning research. It uses a custom Rust NES emulator specialized for SuperMarioBros-Nes mapper 0/NROM, with vectorized stepping on the Rust side so Python crosses into Rust once per batched step. Game-specific preprocessing, including frame skip, grayscale or RGB rendering, cropping, resizing, frame stacking, reward extraction, termination checks, and observation-buffer writes, happens before data returns to Python. It follows the same throughput-first direction as stable-retro-turbo, but drops broad stable-retro compatibility so the emulator and batch API can specialize on Super Mario Bros NES.

Install

git clone https://github.com/tsilva/SuperMarioBros-Nes-turbo.git
cd SuperMarioBros-Nes-turbo
uv sync --extra dev
uv run maturin develop --release

ROM files are not included in this repository. Pass --rom-path to scripts, set SMB_ROM_PATH, or provide rom_path= when constructing environments. Expected SHA-256 for the supported Super Mario Bros NES ROM:

f61548fdf1670cffefcc4f0b7bdcdd9eaba0c226e3b74f8666071496988248de

Import the package as supermariobrosnes_turbo:

import numpy as np

from supermariobrosnes_turbo import Actions, SuperMarioBrosNesTurboVecEnv

env = SuperMarioBrosNesTurboVecEnv(
    "SuperMarioBros-Nes-v0",
    rom_path="/path/to/SuperMarioBros.nes",
    num_envs=64,
    use_restricted_actions=Actions.ALL,
    frame_skip=4,
    obs_grayscale=True,
    frame_stack=4,
    obs_crop=(32, 0, 0, 0),
    obs_resize=(84, 84),
    obs_layout="chw",
)

obs = env.reset()
actions = np.zeros((env.num_envs, env.num_buttons), dtype=np.uint8)
env.step_async(actions)
obs, rewards, dones, infos = env.step_wait()

step_wait() follows the Stable Baselines3 VecEnv contract: it calls the Rust FastMarioVecEnv once for the whole batch and returns (obs, rewards, dones, infos) from reusable NumPy arrays. Use step_fast() when you do not need per-env info dictionaries, or step_wait_gymnasium() when you need separate terminated and truncated arrays.

Initial states can be a single stable-retro state, one state per env slot, or a weighted mapping sampled independently for each lane on reset:

env = SuperMarioBrosNesTurboVecEnv(
    "SuperMarioBros-Nes-v0",
    rom_path="/path/to/SuperMarioBros.nes",
    num_envs=16,
    state={"Level1-1": 0.5, "Level1-4": 0.5},
    done_on={
        "life_loss": ("lives", "decrease"),
        "level_change": (("levelHi", "levelLo"), "change"),
    },
)
env.seed(123)

obs = env.reset()
sampled_states = env.active_states()

Commands

uv sync --extra dev                 # install Python dev dependencies
uv run maturin develop --release    # build and install the Rust extension

make test                           # Rust tests + HF policy completion/parity oracle

uv run python scripts/smoke_smb.py --rom-path /path/to/SuperMarioBros.nes  # quick ROM/emulator smoke check
uv run python scripts/benchmark_vec_env.py --rom-path /path/to/SuperMarioBros.nes --num-envs 8 --frame-skip 4 --frame-stack 4
uv run python scripts/benchmark_sps.py --rom-path /path/to/SuperMarioBros.nes --num-envs 16 --steps 500 --repeats 3

uv run python scripts/play.py --rom-path /path/to/SuperMarioBros.nes --mode external      # raw SDL2 play view
uv run python scripts/play.py --rom-path /path/to/SuperMarioBros.nes --mode external --view preprocessed --scale 4
uv run python scripts/play_policy.py https://huggingface.co/tsilva/SuperMarioBros-NES_Level1 --rom-path /path/to/SuperMarioBros.nes

Fixed-host benchmark target

Use stable-retro-turbo==1.0.1.post1 as the Stable Retro PyPI oracle for new benchmarks and comparisons. Rerun the PyPI oracle baseline before quoting a current speedup, so the comparison uses the same SuperMarioBros-Nes-v0 ROM, saved-state set, frame skip, frame stack, grayscale/crop/resize preprocessing, and 16 vector envs on the fixed beast-3 CPU host.

Historical fixed-host results:

Environment Version / Ref Official median env steps/sec Mean invocation-median env steps/sec Run-median CV Notes
SuperMarioBros-Nes-turbo main 47,611.14 47,605.89 0.28% Full official fixed-host run; all validity gates passed.
stable-retro-turbo PyPI oracle 1.0.0.post23 7,437.65 7,440.04 0.44% Historical only; superseded by 1.0.1.post1 for new comparisons. Statistical gates passed, but the post-run host-load gate failed because the 1-minute load was sampled immediately after the benchmark's own CPU-heavy timing.

Local benchmark artifact paths:

  • artifacts/benchmarks/host-results/host-single-2026-07-02-123806-R17c60e1eb88e/aggregate.json
  • artifacts/benchmarks/host-results/pypi-stable-retro-turbo/1.0.0.post23/0bcebd32669e8e46/aggregate.json

Notes

  • Python >=3.9 and a Rust toolchain are required to build the Maturin extension.
  • The current emulator scope is SuperMarioBros-Nes mapper 0 NROM.
  • The Python package exposes SuperMarioBrosNesTurboVecEnv, ACTION_MEANINGS, CORE_ACTION_MEANINGS, and ACTION_SETS. SuperMarioBrosNesTurboVecEnv subclasses Stable Baselines3 VecEnv when SB3 is installed and follows the stable-retro-turbo RetroVecEnv constructor shape.
  • use_restricted_actions=Actions.ALL and Actions.FILTERED consume per-button MultiBinary masks; Actions.DISCRETE consumes Stable Retro's 36-way discrete action encoding.
  • scripts/play_policy.py loads Stable Baselines3 PPO checkpoints from a local .zip, a Hugging Face repo id, or a https://huggingface.co/... URL and displays raw RGB gameplay in the SDL2 GUI while feeding the model its preprocessed observation stack. It defaults to a Stable Retro playback backend so public SB3/Hugging Face checkpoints use the preprocessing they were trained with; pass --view preprocessed to inspect the model input or --backend native when checking this repo's fast-env parity. The SB3, PyTorch, and Hugging Face Hub dependencies are included in the repo's uv dev environment.
  • By default, scripts/benchmark_sps.py starts lanes from Level1-1, Level1-2, Level1-3, and Level1-4 repeated round-robin. Use --state Level1-1 or another stable-retro state to start every lane from one saved level state. Use --states ... to choose a different round-robin state list. In Python, state= accepts a single state name/path/bytes value, a sequence with exactly one state per env, or a weighted mapping such as {"Level1-1": 0.5, "Level1-4": 0.5}. After reset, active_state_indices() and active_states() report the sampled state for each lane. If needed, pass --state-dir or set SUPERMARIOBROSNES_FASTENV_STATE_DIR.
  • For SuperMarioBrosNesTurboVecEnv, done_on_info accepts named terminal rules like {"life_loss": ("lives", "decrease")}. Supported ops are change, increase, and decrease; keys are drawn from INFO_KEYS. Fired rules are reported in info["done_on_info"] with op, keys, prev, and next.
  • Stable Retro oracle/playback tooling targets stable-retro-turbo==1.0.1.post1 for new benchmarks and comparisons, and constructs the upstream vector env with the current flat keyword names: maxpool_last_two, noop_reset_max, sticky_action_prob, info_filter, obs_copy, and done_on. Runtime fired terminal rules are still read from info["done_on_info"].
  • Benchmark JSON can be written with scripts/benchmark_sps.py --output-json ....
  • Play mode uses the native SDL2 library. If SDL2 is not installed or discoverable, scripts/play.py exits with an SDL backend error.
  • ROM files are not included in the repository; use the SHA-256 digest above to confirm test inputs when needed.

Architecture

SuperMarioBros-Nes-turbo architecture diagram

License

MIT, as declared in pyproject.toml and Cargo.toml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

supermariobrosnes_turbo-0.1.2-cp39-abi3-manylinux_2_28_x86_64.whl (373.9 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

supermariobrosnes_turbo-0.1.2-cp39-abi3-macosx_14_0_arm64.whl (323.9 kB view details)

Uploaded CPython 3.9+macOS 14.0+ ARM64

File details

Details for the file supermariobrosnes_turbo-0.1.2-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for supermariobrosnes_turbo-0.1.2-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0a4c1d6789a85363e82aabe2fcc7f3ceaafeea4476fcf9dbee54c66fcbf628ea
MD5 80d471785c2516f8bab198b1a87518dd
BLAKE2b-256 c729a5060dada417eb241550fcfefc5532a6e234e5dc6d0d385f0e64f9a015bd

See more details on using hashes here.

File details

Details for the file supermariobrosnes_turbo-0.1.2-cp39-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for supermariobrosnes_turbo-0.1.2-cp39-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 08dce3f57fe53ea07c4a96b780db44769eca916fa9e0cf94edd0099bf0ab4683
MD5 d7f5936b94e36bc6b4c45dce85d599f5
BLAKE2b-256 3d6776ea374a04e4e6a721bbf11597cf0b6d23d78e97af5833a05b08dea7c1ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page