Skip to main content

Fast Python 3.14 wheels for stable-retro RL workloads

Project description

stable-retro-turbo

Fast Python 3.14 wheels for stable-retro RL workloads

stable-retro-turbo publishes installable macOS Apple Silicon and Linux wheels for the upstream stable-retro API surface.

Use it when you want stable_retro game environments without building the package and bundled public libretro cores from source yourself.

What changed from upstream

This fork keeps the upstream stable_retro API and adds a small set of RL-throughput features:

  • Python 3.14 wheels for macOS arm64 and Linux x86_64.
  • Bundled Game Boy, NES, SNES, and Genesis/Master System public cores.
  • Worker-local crop, resize, grayscale, frame skip, frame stack, max-pool, no-op reset, sticky actions, and reward clipping.
  • Native C++ screen processing and fused step_repeat_and_process().
  • StableRetroSubprocVecEnv shared-memory observations to reduce IPC copying.
  • Multi-emulator native frontend support inside one process, using per-instance libretro function tables, thread-local callback routing, and isolated core copies when several emulator instances load the same core.
  • StableRetroThreadedVecEnv, an experimental same-process SB3 VecEnv that can run fused emulator steps in Python worker threads or through the native C++ batch stepping entry point.
  • Native C++ step_repeat_and_process_batch() with a persistent worker pool for batched emulator stepping.
  • StableRetroNativeVecEnv, a same-process SB3 VecEnv where C++ owns the emulator pool, frame skip, preprocessing, frame stacking, autoreset, reward and done evaluation, and one contiguous batched observation buffer.
  • STABLE_RETRO_DISABLE_AUDIO=1 for RGB-only agents.
  • scripts/benchmark_vec_env.py for baseline versus optimized throughput runs.

In local Mario benchmarks using SuperMarioBros-Nes-v0, the optimized native vector path is now the fastest measured runtime. The earlier same-process threaded implementation validated multi-emulator execution and removed the old one-emulator-per-process restriction; StableRetroNativeVecEnv moves the remaining hot vector-env state machine into C++ so rollout steps no longer cross Python once per environment.

Recent local direct-ROM fused-preprocessing results on SuperMarioBros-Nes-v0:

envs shared_native_fused native_vec_fused speedup
8 4,037 steps/s 7,797 steps/s 1.93x
16 4,518 steps/s 8,295 steps/s 1.84x
32 5,029 steps/s 8,632 steps/s 1.72x
32, 16 native threads 5,029 steps/s 8,910 steps/s 1.77x
64, 16 native threads not sampled cleanly 7,662 steps/s n/a

The 64-env shared subprocess run was previously interrupted after spending several minutes in worker startup/imports before producing a steady-state measurement; the native vector path avoids that Python worker startup cost.

Install

python -m pip install stable-retro-turbo

Use it from Python:

import stable_retro as retro

env = retro.make("Alleyway-GameBoy-v0", render_mode="rgb_array")

RL preprocessing and SB3

For reinforcement-learning loops, image preprocessing can be done inside each environment worker before observations are returned to the caller. This is useful with SubprocVecEnv, where sending smaller observations across process boundaries can be much faster than returning full-size RGB frames and resizing later. This is the main speedup this fork adds over the upstream wrapper stack.

import stable_retro as retro

env = retro.make(
    "SuperMarioBros-Nes-v0",
    render_mode="rgb_array",
    obs_resize=(84, 84),
    obs_resize_algorithm="nearest",  # nearest, bilinear, or area
    obs_grayscale=True,
)

Available image kwargs:

  • obs_resize=(height, width): resize image observations before they leave the env.
  • obs_resize_algorithm="nearest": choose nearest, bilinear, or area; nearest is fastest, while area is downscale-only and does more averaging work.
  • obs_grayscale=True: return grayscale observations with shape (height, width, 1).
  • obs_crop=(top, bottom, left, right): crop pixels before grayscale and resize.
  • frame_skip=4: repeat each selected action inside the worker and sum rewards.
  • frame_stack=4: stack recent observations inside the worker before IPC.
  • maxpool_last_two=True: max-pool the last two skipped image frames.
  • noop_reset_max=30: apply a random number of no-op reset steps.
  • sticky_action_prob=0.25: probabilistically repeat the previous action.
  • reward_clip=True: clip rewards to [-1, 1].

Pass the same options through Stable-Baselines3 with env_kwargs:

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv, VecTransposeImage


def make_mario_env(**kwargs):
    return retro.make(
        "SuperMarioBros-Nes-v0",
        render_mode="rgb_array",
        **kwargs,
    )


env = make_vec_env(
    make_mario_env,
    n_envs=8,
    vec_env_cls=SubprocVecEnv,
    vec_env_kwargs={"start_method": "spawn"},
    env_kwargs={
        "obs_resize": (84, 84),
        "obs_resize_algorithm": "nearest",
        "obs_grayscale": True,
    },
)
env = VecTransposeImage(env)  # (n_envs, 1, 84, 84) for grayscale

For the fastest SB3-style Mario rollouts, use the native vector env directly:

from stable_retro import StableRetroNativeVecEnv

env = StableRetroNativeVecEnv(
    "SuperMarioBros-Nes-v0",
    num_envs=32,
    num_threads=16,
    render_mode="rgb_array",
    obs_resize=(84, 84),
    obs_grayscale=True,
    frame_skip=4,
    frame_stack=4,
    maxpool_last_two=True,
)

StableRetroNativeVecEnv currently targets homogeneous single-player image rollouts with no movie recording and no screen rotation. It keeps the hot rollout path in C++ and returns a contiguous NumPy observation batch shaped (num_envs, height, width, channels * frame_stack).

For lower IPC overhead than standard SubprocVecEnv, use the shared-memory vector env:

from stable_retro import StableRetroSubprocVecEnv

env = StableRetroSubprocVecEnv([make_mario_env for _ in range(8)])

The shared-memory vector env keeps observations in a parent-owned shared buffer, so workers only send rewards, done flags, and infos through pipes on each step. For Atari-style image rollouts this pairs well with env-local preprocessing and fused native frame skipping:

env = StableRetroSubprocVecEnv(
    [
        lambda: retro.make(
            "SuperMarioBros-Nes-v0",
            render_mode="rgb_array",
            obs_resize=(84, 84),
            obs_grayscale=True,
            frame_skip=4,
            frame_stack=4,
            maxpool_last_two=True,
        )
        for _ in range(16)
    ],
)

When possible, image preprocessing and repeated-step processing use native C++ helpers instead of Python image loops. The native path is selected automatically for single-player image observations with no rotation or movie recording. Set STABLE_RETRO_DISABLE_NATIVE_IMAGEOPS=1 or STABLE_RETRO_DISABLE_NATIVE_FUSED_STEP=1 to force the Python fallback while debugging or benchmarking.

StableRetroChunkedSubprocVecEnv is also available as an experimental generic Gymnasium vector env that puts multiple envs in each worker process:

from stable_retro import StableRetroChunkedSubprocVecEnv

env = StableRetroChunkedSubprocVecEnv(env_fns, chunk_size=4)

StableRetroThreadedVecEnv is still available for experimental same-process execution where you need Python RetroEnv objects:

from stable_retro import StableRetroThreadedVecEnv

env = StableRetroThreadedVecEnv(
    [
        lambda: retro.make(
            "SuperMarioBros-Nes-v0",
            render_mode="rgb_array",
            obs_resize=(84, 84),
            obs_grayscale=True,
            frame_skip=4,
            frame_stack=4,
            maxpool_last_two=True,
        )
        for _ in range(8)
    ],
)

By default it uses the native C++ batch stepping entry point when the envs are single-player image observations with no movie playback or rotation. Set STABLE_RETRO_DISABLE_NATIVE_BATCH_STEP=1 or pass use_native_batch=False to compare against the persistent Python thread-pool path. For current Mario/SB3-style rollouts, StableRetroSubprocVecEnv is still slower than StableRetroNativeVecEnv but remains more general.

If your agent does not use audio, set STABLE_RETRO_DISABLE_AUDIO=1 before creating environments. This keeps RGB observations enabled while skipping audio capture and supported core-side audio generation.

The deprecated compatibility import still works:

import retro

For local development:

git clone https://github.com/tsilva/stable-retro-turbo.git
cd stable-retro-turbo
brew install cmake pkg-config lua@5.4 libzip
python -m pip install -U pip build cibuildwheel pytest pre-commit
python -m pip install -e .

Commands

python -m pip install stable-retro-turbo          # install the published package
python -m pip install -e .                        # build and install this checkout
python -m build --wheel                           # build a local wheel
python -m cibuildwheel . --output-dir wheelhouse  # build release-style wheels
pytest                                            # run Python tests
pre-commit run --all-files                        # run repository hooks
cmake . && make -j2 && make -j2 -f tests/Makefile && ctest --progress --verbose
python scripts/benchmark_vec_env.py --game SuperMarioBros-Nes-v0 --num-envs 8

Notes

  • Published wheels target Apple Silicon arm64 on macOS 14.0+ and x86_64 on Linux, for Python 3.14.
  • Package versions follow the upstream stable-retro base version with this fork's patch number as a PEP 440 post-release suffix, for example 1.0.0.post1.
  • The public wheel build includes Game Boy, NES, SNES, and Sega Master System cores: gambatte, fceumm, snes9x, and genesis_plus_gx.
  • CapnProto is disabled in the public wheel build path.
  • SNES on Apple Silicon uses an automatic Rosetta helper because the native arm64 snes9x path is not stable across the bundled integrations.
  • If Rosetta is not installed yet, install it once:
softwareupdate --install-rosetta --agree-to-license

Architecture

stable-retro-turbo architecture diagram

License

MIT. Bundled third-party notices are listed in LICENSES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (103.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl (101.9 MB view details)

Uploaded CPython 3.14macOS 14.0+ ARM64

File details

Details for the file stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e0d241e8d74a89c18b0feb4de123e9c44ffe272a28b03873267324ad0365bada
MD5 a111ac7a1f7d9aa11b33a77ef31f96aa
BLAKE2b-256 c68b283ad77097acfe670a0fe41d811cf7573d8818adfabf8f76a0a5a0213259

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on tsilva/stable-retro-turbo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 1fb398fd83270ea759147493f95c7f7df3ee06df5b0cc4c0490a3a0337f9ef7f
MD5 81dd3db07c86efab18e97820f833145f
BLAKE2b-256 4492df1b1c9f8f3026fe9adf2c0b9a2cefb4b72d4af0c6a771c0f8472560869c

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl:

Publisher: release.yml on tsilva/stable-retro-turbo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page