Skip to main content

POMDP Arcade Environments on the GPU

Project description

POPGym Arcade - GPU-Accelerated POMDPs

Tests

GIF 1 GIF 2 GIF 3 GIF 4 GIF 5 GIF 6 GIF 7
GIF 1 GIF 2 GIF 3 GIF 4 GIF 5 GIF 6 GIF 7

POPGym Arcade contains 7 pixel-based POMDPs in the style of the Arcade Learning Environment. Each environment provides:

  • 3 Difficulty settings
  • Common observation and action space shared across all envs
  • Fully observable and partially observable configurations
  • Fast and easy GPU vectorization using jax.vmap and jax.jit

Gradient Visualization

We also provide tools to visualize how policies use memory.

See below for further instructions.

Throughput

You can expect millions of frames per second on a consumer-grade GPU. With obs_size=128, most policies converge within 30-60 minutes of training.

Getting Started

Installation

To install the environments, run

pip install popgym-arcade

If you plan to use our training scripts, install the baselines as well

pip install 'popgym-arcade[baselines]'

Human Play

To best understand the environments, you should try and play them yourself. The play script lets you play the games yourself using the arrow keys and spacebar.

popgym-arcade-play NoisyCartPoleEasy        # play MDP 256 pixel version
popgym-arcade-play BattleShipEasy -p -o 128 # play POMDP 128 pixel version

Creating and Stepping Environments

Our envs are gymnax envs, so you can use your wrappers and code designed to work with gymnax. The following example demonstrates how to integrate POPGym Arcade into your code.

import popgym_arcade
import jax

# Create both POMDP and MDP env variants
pomdp, pomdp_params = popgym_arcade.make("BattleShipEasy", partial_obs=True)
mdp, mdp_params = popgym_arcade.make("BattleShipEasy", partial_obs=False)

# Let's vectorize and compile the envs
# Note when you are training a policy, it is better to compile your policy_update rather than the env_step
pomdp_reset = jax.jit(jax.vmap(pomdp.reset, in_axes=(0, None)))
pomdp_step = jax.jit(jax.vmap(pomdp.step, in_axes=(0, 0, 0, None)))
mdp_reset = jax.jit(jax.vmap(mdp.reset, in_axes=(0, None)))
mdp_step = jax.jit(jax.vmap(mdp.step, in_axes=(0, 0, 0, None)))
    
# Initialize four vectorized environments
n_envs = 4
# Initialize PRNG keys
key = jax.random.key(0)
reset_keys = jax.random.split(key, n_envs)
    
# Reset environments
observation, env_state = pomdp_reset(reset_keys, pomdp_params)

# Step the POMDPs
for t in range(10):
    # Propagate some randomness
    action_key, step_key = jax.random.split(jax.random.key(t))
    action_keys = jax.random.split(action_key, n_envs)
    step_keys = jax.random.split(step_key, n_envs)
    # Pick actions at random
    actions = jax.vmap(pomdp.action_space(pomdp_params).sample)(action_keys)
    # Step the env to the next state
    # No need to reset, gymnax automatically resets when done
    observation, env_state, reward, done, info = pomdp_step(step_keys, env_state, actions, pomdp_params)

# POMDP and MDP variants share states
# We can plug the POMDP states into the MDP and continue playing 
action_keys = jax.random.split(jax.random.key(t + 1), n_envs)
step_keys = jax.random.split(jax.random.key(t + 2), n_envs)
markov_state, env_state, reward, done, info = mdp_step(step_keys, env_state, actions, mdp_params)

Memory Introspection Tools

We implement visualization tools to probe which pixels persist in agent memory, and their impact on Q value predictions. Try code below or vis example to visualize the memory your agent uses

from popgym_arcade.baselines.model.builder import QNetworkRNN
from popgym_arcade.baselines.utils import get_saliency_maps, vis_fn
import equinox as eqx
import jax

config = {
    # Env string
    "ENV_NAME": "NavigatorEasy",
    # Whether to use full or partial observability
    "PARTIAL": True,
    # Memory model type (see models directory)
    "MEMORY_TYPE": "lru",
    # Evaluation episode seed
    "SEED": 0,
    # Observation size in pixels (128 or 256)
    "OBS_SIZE": 128,
}

# Initialize the random key
rng = jax.random.PRNGKey(config["SEED"])

# Initialize the model
network = QNetworkRNN(rng, rnn_type=config["MEMORY_TYPE"], obs_size=config["OBS_SIZE"])
# Load the model
model = eqx.tree_deserialise_leaves("PATH_TO_YOUR_MODEL_WEIGHTS.pkl", network)
# Compute the saliency maps
grads, obs_seq, grad_accumulator = get_saliency_maps(rng, model, config)
# Visualize the saliency maps
# If you have latex installed, set use_latex=True
vis_fn(grads, obs_seq, config, use_latex=False)

Other Useful Libraries

  • gymnax - The (deprecated) jax-capable gymnasium API
  • stable-gymnax - A maintained and patched version of gymnax
  • popgym - The original collection of POMDPs, implemented in numpy
  • popjaxrl - A jax version of popgym
  • popjym - A more readable version of popjaxrl environments that served as a basis for our work

Citation

@article{wang2025popgym,
  title={POPGym Arcade: Parallel Pixelated POMDPs},
  author={Wang, Zekang and He, Zhe and Zhang, Borong and Toledo, Edan and Morad, Steven},
  journal={arXiv preprint arXiv:2503.01450},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

popgym_arcade-0.0.2.tar.gz (78.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

popgym_arcade-0.0.2-py3-none-any.whl (115.3 kB view details)

Uploaded Python 3

File details

Details for the file popgym_arcade-0.0.2.tar.gz.

File metadata

  • Download URL: popgym_arcade-0.0.2.tar.gz
  • Upload date:
  • Size: 78.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for popgym_arcade-0.0.2.tar.gz
Algorithm Hash digest
SHA256 50b4b3c7d1180688bfecad78c7ee02a5aee129678a2ad7e9a3d5a25c26152a7d
MD5 fb96e3ef82043c6161fd6035a819428f
BLAKE2b-256 497d6833a14852671eaa837a45ea7fdd8de1c92c3b8567b6044c05629cd240d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for popgym_arcade-0.0.2.tar.gz:

Publisher: python-publish.yml on bolt-research/popgym-arcade

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file popgym_arcade-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: popgym_arcade-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 115.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for popgym_arcade-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4e0be0b477c8fe6e55352ecb14b2a33e4a5e3a1af29287c847731e7bfe610a8b
MD5 b6ca55305749b5c82721022174b9cbbc
BLAKE2b-256 a79255952d04347dd484a755f068cae6e07855ae030151723f5efb186bda880a

See more details on using hashes here.

Provenance

The following attestation bundles were made for popgym_arcade-0.0.2-py3-none-any.whl:

Publisher: python-publish.yml on bolt-research/popgym-arcade

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page