Fast Python 3.14 wheels for stable-retro RL workloads
Project description
Fast Python 3.14 wheels for stable-retro RL workloads
stable-retro-turbo publishes installable macOS Apple Silicon and Linux wheels for the upstream stable-retro API surface.
Use it when you want stable_retro game environments without building the package and bundled public libretro cores from source yourself.
What changed from upstream
This fork keeps the upstream stable_retro API and adds a small set of
RL-throughput features:
- Python
3.14wheels for macOSarm64and Linuxx86_64. - Bundled Game Boy, NES, SNES, and Genesis/Master System public cores.
- Worker-local crop, resize, grayscale, frame skip, frame stack, max-pool, no-op reset, sticky actions, and reward clipping.
- Native C++ screen processing and fused
step_repeat_and_process(). StableRetroSubprocVecEnvshared-memory observations to reduce IPC copying.- Multi-emulator native frontend support inside one process, using per-instance libretro function tables, thread-local callback routing, and isolated core copies when several emulator instances load the same core.
StableRetroThreadedVecEnv, an experimental same-process SB3 VecEnv that can run fused emulator steps in Python worker threads or through the native C++ batch stepping entry point.- Native C++
step_repeat_and_process_batch()with a persistent worker pool for batched emulator stepping. StableRetroNativeVecEnv, a same-process SB3 VecEnv where C++ owns the emulator pool, frame skip, preprocessing, frame stacking, autoreset, reward and done evaluation, and one contiguous batched observation buffer.STABLE_RETRO_DISABLE_AUDIO=1for RGB-only agents.scripts/benchmark_vec_env.pyfor baseline versus optimized throughput runs.
In local Mario benchmarks using SuperMarioBros-Nes-v0, the optimized
native vector path is now the fastest measured runtime. The earlier
same-process threaded implementation validated multi-emulator execution and
removed the old one-emulator-per-process restriction; StableRetroNativeVecEnv
moves the remaining hot vector-env state machine into C++ so rollout steps no
longer cross Python once per environment.
Recent local direct-ROM fused-preprocessing results on
SuperMarioBros-Nes-v0:
| envs | shared_native_fused | native_vec_fused | speedup |
|---|---|---|---|
| 8 | 4,037 steps/s | 7,797 steps/s | 1.93x |
| 16 | 4,518 steps/s | 8,295 steps/s | 1.84x |
| 32 | 5,029 steps/s | 8,632 steps/s | 1.72x |
| 32, 16 native threads | 5,029 steps/s | 8,910 steps/s | 1.77x |
| 64, 16 native threads | not sampled cleanly | 7,662 steps/s | n/a |
The 64-env shared subprocess run was previously interrupted after spending several minutes in worker startup/imports before producing a steady-state measurement; the native vector path avoids that Python worker startup cost.
Install
python -m pip install stable-retro-turbo
Use it from Python:
import stable_retro as retro
env = retro.make("Alleyway-GameBoy-v0", render_mode="rgb_array")
RL preprocessing and SB3
For reinforcement-learning loops, image preprocessing can be done inside each
environment worker before observations are returned to the caller. This is useful
with SubprocVecEnv, where sending smaller observations across process
boundaries can be much faster than returning full-size RGB frames and resizing
later. This is the main speedup this fork adds over the upstream wrapper stack.
import stable_retro as retro
env = retro.make(
"SuperMarioBros-Nes-v0",
render_mode="rgb_array",
obs_resize=(84, 84),
obs_resize_algorithm="nearest", # nearest, bilinear, or area
obs_grayscale=True,
)
Available image kwargs:
obs_resize=(height, width): resize image observations before they leave the env.obs_resize_algorithm="nearest": choosenearest,bilinear, orarea;nearestis fastest, whileareais downscale-only and does more averaging work.obs_grayscale=True: return grayscale observations with shape(height, width, 1).obs_crop=(top, bottom, left, right): crop pixels before grayscale and resize.frame_skip=4: repeat each selected action inside the worker and sum rewards.frame_stack=4: stack recent observations inside the worker before IPC.maxpool_last_two=True: max-pool the last two skipped image frames.noop_reset_max=30: apply a random number of no-op reset steps.sticky_action_prob=0.25: probabilistically repeat the previous action.reward_clip=True: clip rewards to[-1, 1].
Pass the same options through Stable-Baselines3 with env_kwargs:
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv, VecTransposeImage
def make_mario_env(**kwargs):
return retro.make(
"SuperMarioBros-Nes-v0",
render_mode="rgb_array",
**kwargs,
)
env = make_vec_env(
make_mario_env,
n_envs=8,
vec_env_cls=SubprocVecEnv,
vec_env_kwargs={"start_method": "spawn"},
env_kwargs={
"obs_resize": (84, 84),
"obs_resize_algorithm": "nearest",
"obs_grayscale": True,
},
)
env = VecTransposeImage(env) # (n_envs, 1, 84, 84) for grayscale
For the fastest SB3-style Mario rollouts, use the native vector env directly:
from stable_retro import StableRetroNativeVecEnv
env = StableRetroNativeVecEnv(
"SuperMarioBros-Nes-v0",
num_envs=32,
num_threads=16,
render_mode="rgb_array",
obs_resize=(84, 84),
obs_grayscale=True,
frame_skip=4,
frame_stack=4,
maxpool_last_two=True,
)
StableRetroNativeVecEnv currently targets homogeneous single-player image
rollouts with no movie recording and no screen rotation. It keeps the hot
rollout path in C++ and returns a contiguous NumPy observation batch shaped
(num_envs, height, width, channels * frame_stack).
For lower IPC overhead than standard SubprocVecEnv, use the shared-memory
vector env:
from stable_retro import StableRetroSubprocVecEnv
env = StableRetroSubprocVecEnv([make_mario_env for _ in range(8)])
The shared-memory vector env keeps observations in a parent-owned shared buffer, so workers only send rewards, done flags, and infos through pipes on each step. For Atari-style image rollouts this pairs well with env-local preprocessing and fused native frame skipping:
env = StableRetroSubprocVecEnv(
[
lambda: retro.make(
"SuperMarioBros-Nes-v0",
render_mode="rgb_array",
obs_resize=(84, 84),
obs_grayscale=True,
frame_skip=4,
frame_stack=4,
maxpool_last_two=True,
)
for _ in range(16)
],
)
When possible, image preprocessing and repeated-step processing use native C++
helpers instead of Python image loops. The native path is selected automatically
for single-player image observations with no rotation or movie recording. Set
STABLE_RETRO_DISABLE_NATIVE_IMAGEOPS=1 or
STABLE_RETRO_DISABLE_NATIVE_FUSED_STEP=1 to force the Python fallback while
debugging or benchmarking.
StableRetroChunkedSubprocVecEnv is also available as an experimental generic
Gymnasium vector env that puts multiple envs in each worker process:
from stable_retro import StableRetroChunkedSubprocVecEnv
env = StableRetroChunkedSubprocVecEnv(env_fns, chunk_size=4)
StableRetroThreadedVecEnv is still available for experimental same-process
execution where you need Python RetroEnv objects:
from stable_retro import StableRetroThreadedVecEnv
env = StableRetroThreadedVecEnv(
[
lambda: retro.make(
"SuperMarioBros-Nes-v0",
render_mode="rgb_array",
obs_resize=(84, 84),
obs_grayscale=True,
frame_skip=4,
frame_stack=4,
maxpool_last_two=True,
)
for _ in range(8)
],
)
By default it uses the native C++ batch stepping entry point when the envs are
single-player image observations with no movie playback or rotation. Set
STABLE_RETRO_DISABLE_NATIVE_BATCH_STEP=1 or pass
use_native_batch=False to compare against the persistent Python thread-pool
path. For current Mario/SB3-style rollouts, StableRetroSubprocVecEnv is still
slower than StableRetroNativeVecEnv but remains more general.
If your agent does not use audio, set STABLE_RETRO_DISABLE_AUDIO=1 before
creating environments. This keeps RGB observations enabled while skipping audio
capture and supported core-side audio generation.
The deprecated compatibility import still works:
import retro
For local development:
git clone https://github.com/tsilva/stable-retro-turbo.git
cd stable-retro-turbo
brew install cmake pkg-config lua@5.4 libzip
python -m pip install -U pip build cibuildwheel pytest pre-commit
python -m pip install -e .
Commands
python -m pip install stable-retro-turbo # install the published package
python -m pip install -e . # build and install this checkout
python -m build --wheel # build a local wheel
python -m cibuildwheel . --output-dir wheelhouse # build release-style wheels
pytest # run Python tests
pre-commit run --all-files # run repository hooks
cmake . && make -j2 && make -j2 -f tests/Makefile && ctest --progress --verbose
python scripts/benchmark_vec_env.py --game SuperMarioBros-Nes-v0 --num-envs 8
Notes
- Published wheels target Apple Silicon
arm64on macOS14.0+andx86_64on Linux, for Python3.14. - Package versions follow the upstream
stable-retrobase version with this fork's patch number as a PEP 440 post-release suffix, for example1.0.0.post1. - The public wheel build includes Game Boy, NES, SNES, and Sega Master System cores:
gambatte,fceumm,snes9x, andgenesis_plus_gx. - CapnProto is disabled in the public wheel build path.
- SNES on Apple Silicon uses an automatic Rosetta helper because the native arm64
snes9xpath is not stable across the bundled integrations. - If Rosetta is not installed yet, install it once:
softwareupdate --install-rosetta --agree-to-license
- Release automation builds macOS arm64 and Linux x86_64 wheels, publishes them to PyPI, and attaches matching wheel files to GitHub Releases.
- See
PUBLISHING.mdfor the release checklist. - Upstream API and integration docs are still useful:
docs/supported_emulators.md,docs/supported_games.md, anddocs/macos_installation.md.
Architecture
License
MIT. Bundled third-party notices are listed in LICENSES.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 103.1 MB
- Tags: CPython 3.14, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0d241e8d74a89c18b0feb4de123e9c44ffe272a28b03873267324ad0365bada
|
|
| MD5 |
a111ac7a1f7d9aa11b33a77ef31f96aa
|
|
| BLAKE2b-256 |
c68b283ad77097acfe670a0fe41d811cf7573d8818adfabf8f76a0a5a0213259
|
Provenance
The following attestation bundles were made for stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
release.yml on tsilva/stable-retro-turbo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stable_retro_turbo-1.0.0.post2-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
e0d241e8d74a89c18b0feb4de123e9c44ffe272a28b03873267324ad0365bada - Sigstore transparency entry: 1808805196
- Sigstore integration time:
-
Permalink:
tsilva/stable-retro-turbo@263aaf829de20819ee2b990c5bfc6c5b0823c4ad -
Branch / Tag:
refs/tags/v1.0.0.post2 - Owner: https://github.com/tsilva
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@263aaf829de20819ee2b990c5bfc6c5b0823c4ad -
Trigger Event:
release
-
Statement type:
File details
Details for the file stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl.
File metadata
- Download URL: stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl
- Upload date:
- Size: 101.9 MB
- Tags: CPython 3.14, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fb398fd83270ea759147493f95c7f7df3ee06df5b0cc4c0490a3a0337f9ef7f
|
|
| MD5 |
81dd3db07c86efab18e97820f833145f
|
|
| BLAKE2b-256 |
4492df1b1c9f8f3026fe9adf2c0b9a2cefb4b72d4af0c6a771c0f8472560869c
|
Provenance
The following attestation bundles were made for stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl:
Publisher:
release.yml on tsilva/stable-retro-turbo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stable_retro_turbo-1.0.0.post2-cp314-cp314-macosx_14_0_arm64.whl -
Subject digest:
1fb398fd83270ea759147493f95c7f7df3ee06df5b0cc4c0490a3a0337f9ef7f - Sigstore transparency entry: 1808805253
- Sigstore integration time:
-
Permalink:
tsilva/stable-retro-turbo@263aaf829de20819ee2b990c5bfc6c5b0823c4ad -
Branch / Tag:
refs/tags/v1.0.0.post2 - Owner: https://github.com/tsilva
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@263aaf829de20819ee2b990c5bfc6c5b0823c4ad -
Trigger Event:
release
-
Statement type: