stable-retro-turbo

Fast Python 3.14 wheels for stable-retro RL workloads

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

engtiagosilva

Project description

Fast Python 3.14 wheels for stable-retro RL workloads

stable-retro-turbo publishes installable macOS Apple Silicon and Linux wheels for the upstream stable-retro API surface.

Use it when you want stable_retro game environments without building the package and bundled public libretro cores from source yourself.

What changed from upstream

This fork keeps the upstream stable_retro API and adds a small set of RL-throughput features:

Python 3.14 wheels for macOS arm64 and Linux x86_64.
Bundled Game Boy, NES, SNES, and Genesis/Master System public cores.
Worker-local crop, resize, grayscale, frame skip, frame stack, max-pool, no-op reset, sticky actions, and reward clipping.
Native C++ screen processing and fused step_repeat_and_process().
StableRetroSubprocVecEnv shared-memory observations to reduce IPC copying.
Multi-emulator native frontend support inside one process, using per-instance libretro function tables, thread-local callback routing, and isolated core copies when several emulator instances load the same core.
StableRetroThreadedVecEnv, an experimental same-process SB3 VecEnv that can run fused emulator steps in Python worker threads or through the native C++ batch stepping entry point.
Native C++ step_repeat_and_process_batch() with a persistent worker pool for batched emulator stepping.
StableRetroNativeVecEnv, a same-process SB3 VecEnv where C++ owns the emulator pool, frame skip, preprocessing, frame stacking, autoreset, reward and done evaluation, and one contiguous batched observation buffer.
STABLE_RETRO_DISABLE_AUDIO=1 for RGB-only agents.
scripts/benchmark_vec_env.py for baseline versus optimized throughput runs.

In local Mario benchmarks using SuperMarioBros-Nes-v0, the optimized native vector path is now the fastest measured runtime. The earlier same-process threaded implementation validated multi-emulator execution and removed the old one-emulator-per-process restriction; StableRetroNativeVecEnv moves the remaining hot vector-env state machine into C++ so rollout steps no longer cross Python once per environment.

Recent local direct-ROM fused-preprocessing results on SuperMarioBros-Nes-v0:

envs	shared_native_fused	native_vec_fused	speedup
8	4,037 steps/s	7,797 steps/s	1.93x
16	4,518 steps/s	8,295 steps/s	1.84x
32	5,029 steps/s	8,632 steps/s	1.72x
32, 16 native threads	5,029 steps/s	8,910 steps/s	1.77x
64, 16 native threads	not sampled cleanly	7,662 steps/s	n/a

The 64-env shared subprocess run was previously interrupted after spending several minutes in worker startup/imports before producing a steady-state measurement; the native vector path avoids that Python worker startup cost.

Install

python -m pip install stable-retro-turbo

Use it from Python:

import stable_retro as retro

env = retro.make("Alleyway-GameBoy-v0", render_mode="rgb_array")

RL preprocessing and SB3

For reinforcement-learning loops, image preprocessing can be done inside each environment worker before observations are returned to the caller. This is useful with SubprocVecEnv, where sending smaller observations across process boundaries can be much faster than returning full-size RGB frames and resizing later. This is the main speedup this fork adds over the upstream wrapper stack.

import stable_retro as retro

env = retro.make(
    "SuperMarioBros-Nes-v0",
    render_mode="rgb_array",
    obs_resize=(84, 84),
    obs_resize_algorithm="nearest",  # nearest, bilinear, or area
    obs_grayscale=True,
)

Available image kwargs:

obs_resize=(height, width): resize image observations before they leave the env.
obs_resize_algorithm="nearest": choose nearest, bilinear, or area; nearest is fastest, while area is downscale-only and does more averaging work.
obs_grayscale=True: return grayscale observations with shape (height, width, 1).
obs_crop=(top, bottom, left, right): crop pixels before grayscale and resize.
frame_skip=4: repeat each selected action inside the worker and sum rewards.
frame_stack=4: stack recent observations inside the worker before IPC.
maxpool_last_two=True: max-pool the last two skipped image frames.
noop_reset_max=30: apply a random number of no-op reset steps.
sticky_action_prob=0.25: probabilistically repeat the previous action.
reward_clip=True: clip rewards to [-1, 1].

Pass the same options through Stable-Baselines3 with env_kwargs:

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv, VecTransposeImage


def make_mario_env(**kwargs):
    return retro.make(
        "SuperMarioBros-Nes-v0",
        render_mode="rgb_array",
        **kwargs,
    )


env = make_vec_env(
    make_mario_env,
    n_envs=8,
    vec_env_cls=SubprocVecEnv,
    vec_env_kwargs={"start_method": "spawn"},
    env_kwargs={
        "obs_resize": (84, 84),
        "obs_resize_algorithm": "nearest",
        "obs_grayscale": True,
    },
)
env = VecTransposeImage(env)  # (n_envs, 1, 84, 84) for grayscale

For the fastest SB3-style Mario rollouts, use the native vector env directly:

from stable_retro import StableRetroNativeVecEnv

env = StableRetroNativeVecEnv(
    "SuperMarioBros-Nes-v0",
    num_envs=32,
    num_threads=16,
    render_mode="rgb_array",
    obs_resize=(84, 84),
    obs_grayscale=True,
    frame_skip=4,
    frame_stack=4,
    maxpool_last_two=True,
)

StableRetroNativeVecEnv currently targets homogeneous single-player image rollouts with no movie recording and no screen rotation. It keeps the hot rollout path in C++ and returns a contiguous NumPy observation batch shaped (num_envs, height, width, channels * frame_stack).

For lower IPC overhead than standard SubprocVecEnv, use the shared-memory vector env:

from stable_retro import StableRetroSubprocVecEnv

env = StableRetroSubprocVecEnv([make_mario_env for _ in range(8)])

The shared-memory vector env keeps observations in a parent-owned shared buffer, so workers only send rewards, done flags, and infos through pipes on each step. For Atari-style image rollouts this pairs well with env-local preprocessing and fused native frame skipping:

env = StableRetroSubprocVecEnv(
    [
        lambda: retro.make(
            "SuperMarioBros-Nes-v0",
            render_mode="rgb_array",
            obs_resize=(84, 84),
            obs_grayscale=True,
            frame_skip=4,
            frame_stack=4,
            maxpool_last_two=True,
        )
        for _ in range(16)
    ],
)

When possible, image preprocessing and repeated-step processing use native C++ helpers instead of Python image loops. The native path is selected automatically for single-player image observations with no rotation or movie recording. Set STABLE_RETRO_DISABLE_NATIVE_IMAGEOPS=1 or STABLE_RETRO_DISABLE_NATIVE_FUSED_STEP=1 to force the Python fallback while debugging or benchmarking.

StableRetroChunkedSubprocVecEnv is also available as an experimental generic Gymnasium vector env that puts multiple envs in each worker process:

from stable_retro import StableRetroChunkedSubprocVecEnv

env = StableRetroChunkedSubprocVecEnv(env_fns, chunk_size=4)

StableRetroThreadedVecEnv is still available for experimental same-process execution where you need Python RetroEnv objects:

from stable_retro import StableRetroThreadedVecEnv

env = StableRetroThreadedVecEnv(
    [
        lambda: retro.make(
            "SuperMarioBros-Nes-v0",
            render_mode="rgb_array",
            obs_resize=(84, 84),
            obs_grayscale=True,
            frame_skip=4,
            frame_stack=4,
            maxpool_last_two=True,
        )
        for _ in range(8)
    ],
)

By default it uses the native C++ batch stepping entry point when the envs are single-player image observations with no movie playback or rotation. Set STABLE_RETRO_DISABLE_NATIVE_BATCH_STEP=1 or pass use_native_batch=False to compare against the persistent Python thread-pool path. For current Mario/SB3-style rollouts, StableRetroSubprocVecEnv is still slower than StableRetroNativeVecEnv but remains more general.

If your agent does not use audio, set STABLE_RETRO_DISABLE_AUDIO=1 before creating environments. This keeps RGB observations enabled while skipping audio capture and supported core-side audio generation.

The deprecated compatibility import still works:

import retro

For local development:

git clone https://github.com/tsilva/stable-retro-turbo.git
cd stable-retro-turbo
brew install cmake pkg-config lua@5.4 libzip
python -m pip install -U pip build cibuildwheel pytest pre-commit
python -m pip install -e .

Commands

python -m pip install stable-retro-turbo          # install the published package
python -m pip install -e .                        # build and install this checkout
python -m build --wheel                           # build a local wheel
python -m cibuildwheel . --output-dir wheelhouse  # build release-style wheels
pytest                                            # run Python tests
pre-commit run --all-files                        # run repository hooks
cmake . && make -j2 && make -j2 -f tests/Makefile && ctest --progress --verbose
python scripts/benchmark_vec_env.py --game SuperMarioBros-Nes-v0 --num-envs 8

Notes

Published wheels target Apple Silicon arm64 on macOS 14.0+ and x86_64 on Linux, for Python 3.14.
Package versions follow the upstream stable-retro base version with this fork's patch number as a PEP 440 post-release suffix, for example 1.0.0.post1.
The public wheel build includes Game Boy, NES, SNES, and Sega Master System cores: gambatte, fceumm, snes9x, and genesis_plus_gx.
CapnProto is disabled in the public wheel build path.
SNES on Apple Silicon uses an automatic Rosetta helper because the native arm64 snes9x path is not stable across the bundled integrations.
If Rosetta is not installed yet, install it once:

softwareupdate --install-rosetta --agree-to-license

Release automation builds macOS arm64 and Linux x86_64 wheels, publishes them to PyPI, and attaches matching wheel files to GitHub Releases.
See PUBLISHING.md for the release checklist.
Upstream API and integration docs are still useful: docs/supported_emulators.md, docs/supported_games.md, and docs/macos_installation.md.

Architecture

stable-retro-turbo architecture diagram

License

MIT. Bundled third-party notices are listed in LICENSES.md.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

engtiagosilva

Release history Release notifications | RSS feed

1.0.0.post4

Jun 13, 2026

1.0.0.post3

Jun 13, 2026

This version

1.0.0.post2

Jun 13, 2026

1.0.0.post1

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (103.1 MB view details)

Uploaded Jun 13, 2026 CPython 3.14manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl (101.9 MB view details)

Uploaded Jun 13, 2026 CPython 3.14macOS 14.0+ ARM64

File details

Details for the file stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Upload date: Jun 13, 2026
Size: 103.1 MB
Tags: CPython 3.14, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`e0d241e8d74a89c18b0feb4de123e9c44ffe272a28b03873267324ad0365bada`
MD5	`a111ac7a1f7d9aa11b33a77ef31f96aa`
BLAKE2b-256	`c68b283ad77097acfe670a0fe41d811cf7573d8818adfabf8f76a0a5a0213259`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on tsilva/stable-retro-turbo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Subject digest: e0d241e8d74a89c18b0feb4de123e9c44ffe272a28b03873267324ad0365bada
- Sigstore transparency entry: 1808805196
- Sigstore integration time: Jun 13, 2026
Source repository:
- Permalink: tsilva/stable-retro-turbo@263aaf829de20819ee2b990c5bfc6c5b0823c4ad
- Branch / Tag: refs/tags/v1.0.0.post2
- Owner: https://github.com/tsilva
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@263aaf829de20819ee2b990c5bfc6c5b0823c4ad
- Trigger Event: release

File details

Details for the file stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl.

File metadata

Download URL: stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl
Upload date: Jun 13, 2026
Size: 101.9 MB
Tags: CPython 3.14, macOS 14.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl
Algorithm	Hash digest
SHA256	`1fb398fd83270ea759147493f95c7f7df3ee06df5b0cc4c0490a3a0337f9ef7f`
MD5	`81dd3db07c86efab18e97820f833145f`
BLAKE2b-256	`4492df1b1c9f8f3026fe9adf2c0b9a2cefb4b72d4af0c6a771c0f8472560869c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl:

Publisher: release.yml on tsilva/stable-retro-turbo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl
- Subject digest: 1fb398fd83270ea759147493f95c7f7df3ee06df5b0cc4c0490a3a0337f9ef7f
- Sigstore transparency entry: 1808805253
- Sigstore integration time: Jun 13, 2026
Source repository:
- Permalink: tsilva/stable-retro-turbo@263aaf829de20819ee2b990c5bfc6c5b0823c4ad
- Branch / Tag: refs/tags/v1.0.0.post2
- Owner: https://github.com/tsilva
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@263aaf829de20819ee2b990c5bfc6c5b0823c4ad
- Trigger Event: release

stable-retro-turbo 1.0.0.post2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

What changed from upstream

Install

RL preprocessing and SB3

Commands

Notes

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance