Skip to main content

Blazing fast SuperMarioBros-Nes environment for RL research.

Project description

SuperMarioBros-Nes-turbo logo

🚀 Blazing fast SuperMarioBros-Nes environment for RL research 🍄

SuperMarioBros-Nes-turbo is a blazing-fast vectorized Super Mario Bros NES environment for reinforcement-learning research. It uses a custom Rust NES emulator specialized for SuperMarioBros-Nes mapper 0/NROM, with vectorized stepping on the Rust side so Python crosses into Rust once per batched step. Game-specific preprocessing, including frame skip, grayscale or RGB rendering, cropping, resizing, frame stacking, reward extraction, termination checks, and observation-buffer writes, happens before data returns to Python. It follows the same throughput-first direction as stable-retro-turbo, but drops broad stable-retro compatibility so the emulator and batch API can specialize on Super Mario Bros NES.

Why it is fast

Compared with upstream Stable Retro, this package does not run many Python RetroEnv instances through SubprocVecEnv, DummyVecEnv, or wrapper stacks for frame skip, resize, grayscale, frame stack, reward, and termination logic. Compared with stable-retro-turbo, it keeps the same native-vector philosophy but gives up the general Stable Retro compatibility layer, arbitrary game/core support, and generic emulator contracts. The speed comes from these current fast paths:

  • SMB/NROM-only Rust emulator: the core supports the Super Mario Bros NES mapper 0/NROM shape directly instead of routing every access through a general multi-console emulator interface.
  • Fixed cartridge memory paths: PRG/CHR reads use precomputed power-of-two masks, direct PRG ROM instruction fetches, fixed nametable mirroring, and direct CPU memory paths for RAM, PPU registers, controllers, and PRG ROM.
  • One Python call per vector step: reset_into(), step_into(), and info_into() mutate caller-owned NumPy arrays, release the GIL, and avoid creating new observations, rewards, done arrays, and scalar info arrays on every step.
  • Rust-side batch execution: vector lanes step in Rust with Rayon when the batch is large enough, so the Python side only submits action arrays and reads already-filled result buffers.
  • step_fast() info bypass: training and benchmark loops can skip per-env Python info dictionaries and keep x-position, score, lives, level, timer, and scroll values in typed arrays.
  • Fused RL preprocessing: frame skip, optional max-pool, reward accumulation, termination checks, grayscale/RGB rendering, crop, area resize, and frame-stack writes happen in the native step loop before data returns to Python.
  • Observation buffer as frame-stack state: the returned observation buffer is also the persistent stack buffer; old frames shift in place and only the newest processed frame is written into the final stack slot.
  • Direct grayscale renderer: the common pixel path renders SMB background tile rows and sprite overlays directly to grayscale from NES palette values, instead of first materializing RGB and then converting it in Python.
  • Precomputed area resize plan: resize bins are built once per env configuration, then reused for every frame and every lane.
  • Deterministic lane sharing: identical reset lanes, and repeated saved-state groups such as the default Level1-1 through Level1-4 round-robin benchmark, can share one emulator state while actions remain uniform; mixed actions materialize independent lane states before stepping, preserving the public vector-env contract.
  • SMB routine fast-forwards: the emulator recognizes exact Super Mario Bros ROM byte signatures for the idle loop, sprite-0 polling loop, and OAM clear helper, then advances equivalent CPU/PPU cycles without interpreting every repeated 6502 instruction.
  • Rust-side reward and terminal rules: x-position reward, flag completion, life-loss/level-change style done_on_info rules, terminal observation capture, and autoreset bookkeeping stay in the Rust/Python fast-env boundary rather than in wrapper chains.
  • Scoped compatibility paths: RGB, uncropped rendering, Gymnasium/SB3-style info dictionaries, terminal observations, sticky actions, random no-op starts, and multi-state curricula are still available, but the benchmark path keeps them on explicit typed/native routes instead of paying broad Stable Retro overhead unconditionally.

Install

git clone https://github.com/tsilva/SuperMarioBros-Nes-turbo.git
cd SuperMarioBros-Nes-turbo
uv sync --extra dev
uv run maturin develop --release

ROM files are not included in this repository. Pass --rom-path to scripts, set SMB_ROM_PATH, or provide rom_path= when constructing environments. Expected SHA-256 for the supported Super Mario Bros NES ROM:

f61548fdf1670cffefcc4f0b7bdcdd9eaba0c226e3b74f8666071496988248de

Import the package as supermariobrosnes_turbo:

import numpy as np

from supermariobrosnes_turbo import Actions, SuperMarioBrosNesTurboVecEnv

env = SuperMarioBrosNesTurboVecEnv(
    "SuperMarioBros-Nes-v0",
    rom_path="/path/to/SuperMarioBros.nes",
    num_envs=64,
    use_restricted_actions=Actions.ALL,
    frame_skip=4,
    obs_grayscale=True,
    frame_stack=4,
    obs_crop=(32, 0, 0, 0),
    obs_resize=(84, 84),
    obs_layout="chw",
)

obs = env.reset()
actions = np.zeros((env.num_envs, env.num_buttons), dtype=np.uint8)
env.step_async(actions)
obs, rewards, dones, infos = env.step_wait()

step_wait() follows the Stable Baselines3 VecEnv contract: it calls the Rust SuperMarioBrosNesTurboVecEnv once for the whole batch and returns (obs, rewards, dones, infos) from reusable NumPy arrays. Use step_fast() when you do not need per-env info dictionaries, or step_wait_gymnasium() when you need separate terminated and truncated arrays.

Initial states can be a single stable-retro state, one state per env slot, or a weighted mapping sampled independently for each lane on reset:

env = SuperMarioBrosNesTurboVecEnv(
    "SuperMarioBros-Nes-v0",
    rom_path="/path/to/SuperMarioBros.nes",
    num_envs=16,
    state={"Level1-1": 0.5, "Level1-4": 0.5},
    done_on={
        "life_loss": ("lives", "decrease"),
        "level_change": (("levelHi", "levelLo"), "change"),
    },
)
env.seed(123)

obs = env.reset()
sampled_states = env.active_states()

Commands

uv sync --extra dev                 # install Python dev dependencies
uv run maturin develop --release    # build and install the Rust extension

make test                           # Rust tests + HF policy completion/parity oracle

uv run python scripts/smoke_smb.py --rom-path /path/to/SuperMarioBros.nes  # quick ROM/emulator smoke check
uv run python scripts/benchmark_sps.py --rom-path /path/to/SuperMarioBros.nes --num-envs 16 --steps 500 --repeats 3

uv run python scripts/play.py --rom-path /path/to/SuperMarioBros.nes --mode external      # raw SDL2 play view
uv run python scripts/play.py --rom-path /path/to/SuperMarioBros.nes --mode external --view preprocessed --scale 4
uv run python scripts/play_policy.py https://huggingface.co/tsilva/SuperMarioBros-NES_Level1 --rom-path /path/to/SuperMarioBros.nes

Release

Release tags drive the GitHub Actions wheel build. From a clean, synced branch with the release environment installed, create the next minor release with:

uv sync --extra dev --group dev
make release

Use scripts/release.py --part patch, --part major, or --to 0.2.0 for other release shapes. The script refuses to run unless the current branch is clean and synced with its upstream. It verifies the target version is not already on PyPI, bumps pyproject.toml and Cargo.toml, refreshes lockfiles, runs local gates, commits Release v<version>, creates the matching tag, and pushes the branch plus tag. The pushed tag triggers the release workflow, which builds, audits, and publishes the wheels to PyPI via trusted publishing.

Fixed-host benchmark target

Use stable-retro-turbo==1.0.1.post1 as the Stable Retro PyPI oracle for new benchmarks and comparisons. Rerun the PyPI oracle baseline before quoting a current speedup, so the comparison uses the same SuperMarioBros-Nes-v0 ROM, saved-state set, frame skip, frame stack, grayscale/crop/resize preprocessing, and 16 vector envs on the fixed beast-3 CPU host.

Historical fixed-host results:

Environment Version / Ref Official median env steps/sec Mean invocation-median env steps/sec Run-median CV Notes
SuperMarioBros-Nes-turbo main 47,611.14 47,605.89 0.28% Full official fixed-host run; all validity gates passed.
stable-retro-turbo PyPI oracle 1.0.0.post23 7,437.65 7,440.04 0.44% Historical only; superseded by 1.0.1.post1 for new comparisons. Statistical gates passed, but the post-run host-load gate failed because the 1-minute load was sampled immediately after the benchmark's own CPU-heavy timing.

Local benchmark artifact paths:

  • artifacts/benchmarks/host-results/host-single-2026-07-02-123806-R17c60e1eb88e/aggregate.json
  • artifacts/benchmarks/host-results/pypi-stable-retro-turbo/1.0.0.post23/0bcebd32669e8e46/aggregate.json

Notes

  • Python >=3.9 and a Rust toolchain are required to build the Maturin extension.
  • The current emulator scope is SuperMarioBros-Nes mapper 0 NROM.
  • The Python package exposes SuperMarioBrosNesTurboVecEnv, ACTION_MEANINGS, CORE_ACTION_MEANINGS, and ACTION_SETS. SuperMarioBrosNesTurboVecEnv subclasses Stable Baselines3 VecEnv when SB3 is installed and follows the stable-retro-turbo RetroVecEnv constructor shape.
  • use_restricted_actions=Actions.ALL and Actions.FILTERED consume per-button MultiBinary masks; Actions.DISCRETE consumes Stable Retro's 36-way discrete action encoding.
  • scripts/play_policy.py loads Stable Baselines3 PPO checkpoints from a local .zip, a Hugging Face repo id, or a https://huggingface.co/... URL and displays raw RGB gameplay in the SDL2 GUI while feeding the model its preprocessed observation stack. It defaults to a Stable Retro playback backend so public SB3/Hugging Face checkpoints use the preprocessing they were trained with; pass --view preprocessed to inspect the model input or --backend native when checking this repo's fast-env parity. The SB3, PyTorch, and Hugging Face Hub dependencies are included in the repo's uv dev environment.
  • By default, scripts/benchmark_sps.py starts lanes from Level1-1, Level1-2, Level1-3, and Level1-4 repeated round-robin. Use --state Level1-1 or another packaged stable-retro state to start every lane from one saved level state. This package includes the stable Super Mario Bros NES states from Level1-1 through Level8-4, plus Level1-1-99lives, Level2-1-clouds, and Level2-1-clouds-easy. Use --states ... to choose a different round-robin state list. In Python, state= accepts a single state name/path/bytes value, a sequence with exactly one state per env, or a weighted mapping such as {"Level1-1": 0.5, "Level1-4": 0.5}. After reset, active_state_indices() and active_states() report the sampled state for each lane. If needed, pass --state-dir or set SUPERMARIOBROSNES_FASTENV_STATE_DIR.
  • For SuperMarioBrosNesTurboVecEnv, done_on_info accepts named terminal rules like {"life_loss": ("lives", "decrease")}. Supported ops are change, increase, and decrease; keys are drawn from INFO_KEYS. Fired rules are reported in info["done_on_info"] with op, keys, prev, and next.
  • Stable Retro oracle/playback tooling targets stable-retro-turbo==1.0.1.post1 for new benchmarks and comparisons, and constructs the upstream vector env with the current flat keyword names: maxpool_last_two, noop_reset_max, sticky_action_prob, info_filter, obs_copy, and done_on. Runtime fired terminal rules are still read from info["done_on_info"].
  • Benchmark JSON can be written with scripts/benchmark_sps.py --output-json ....
  • Play mode uses the native SDL2 library. If SDL2 is not installed or discoverable, scripts/play.py exits with an SDL backend error.
  • ROM files are not included in the repository; use the SHA-256 digest above to confirm test inputs when needed.

Architecture

SuperMarioBros-Nes-turbo architecture diagram

License

MIT, as declared in pyproject.toml and Cargo.toml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

supermariobrosnes_turbo-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (438.7 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

supermariobrosnes_turbo-0.2.2-cp39-abi3-macosx_14_0_arm64.whl (399.0 kB view details)

Uploaded CPython 3.9+macOS 14.0+ ARM64

File details

Details for the file supermariobrosnes_turbo-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for supermariobrosnes_turbo-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cf0e6205729323b7102b5619819fcb269d4cca2dab8944e143cc9e6258317fe4
MD5 f918ac98be7aca6ed63e03a8a8d896e3
BLAKE2b-256 11ebe0abca3a348e43a1644ebc19f646e424e501335450aa8a915fa4fce64cf0

See more details on using hashes here.

File details

Details for the file supermariobrosnes_turbo-0.2.2-cp39-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for supermariobrosnes_turbo-0.2.2-cp39-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 2788a5b02f8a89c14ea877926471f1d534485b3220b30210a5a419fc831ea9f1
MD5 fafa5eb10da7fba82e623dd75caeeca8
BLAKE2b-256 20d581db97965f43150b54cbd192968202397bfb8fcf2b23be20bbd9ad359894

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page