Skip to main content

Multi-Agent Partially Observable gridworlds in JAX

Project description

MAPOX

MAPOX is a small collection of JAX-native, multi-agent, partially-observable gridworld environments with a shared observation/action format and a simple pygame renderer.

The environments are functional (state in / state out), designed to work well with jax.jit/jax.vmap, and expose an action_mask for per-environment action subsets.

These environments were originally devoloped as part of https://github.com/gabe00122/jaxrl/tree/mapox and this might be a good example for trained polices.

Installation

MAPOX requires Python 3.11+.

uv add mapox

Notes:

  • Video export uses python-ffmpeg and requires the ffmpeg binary available on your system PATH.

Quick start

Environments implement a small interface:

  • reset(rng_key) -> (state, timestep)
  • step(state, actions, rng_key) -> (state, timestep)
  • timestep.obs: (num_agents, view_w, view_h, 4) int8
  • timestep.action_mask: (num_agents, num_actions) bool
import jax
import jax.numpy as jnp
from mapox import EnvironmentFactory, FindReturnConfig

factory = EnvironmentFactory()
env, _ = factory.create_env(FindReturnConfig(num_agents=2), length=512)

rng = jax.random.PRNGKey(0)
state, ts = env.reset(rng)

# Sample a random *legal* action per agent using the action mask
rng, akey, skey = jax.random.split(rng, 3)
logits = jax.random.uniform(akey, (env.num_agents, env.action_spec.n))
actions = jnp.argmax(jnp.where(ts.action_mask, logits, -1e9), axis=-1)

state, ts = env.step(state, actions, skey)
print(ts.reward, ts.terminated)

One-hot encoding observations

Observations are compact categorical channels. You can expand them into one-hot features:

from mapox import concat_one_hot

# sizes is typically (NUM_TILE_TYPES, 5, 3, 3)
sizes = env.observation_spec.max_value
x = concat_one_hot(ts.obs, sizes)  # (..., sum(sizes))

Interactive play / rendering

A simple interactive runner is provided:

uv run -m mapox.play --env king_hill

Controls (depending on the environment):

  • Movement: WASD or arrow keys
  • n: cycle focused agent

The renderer can show the full map or the focused agent’s POV (a local crop). See mapox/play.py for an example of using GridworldClient and recording video.

Observation & action format

All environments share a unified discrete encoding defined in mapox/envs/constance.py.

Actions

The action space is always DiscreteActionSpec(n=7) with IDs:

id action
0 move up
1 move right
2 move down
3 move left
4 stay
5 primary action
6 dig action

Not every environment uses every action. Always consult timestep.action_mask before sampling.

Observation

Each agent receives a local crop centered on itself: (view_width, view_height, 4) with channels:

  1. tile_id (terrain + agent types)
  2. direction (0 = none, 1..4 = cardinal direction)
  3. team_id (0 = none/neutral, 1 = red, 2 = blue)
  4. health (0..2)

Environments

All environment configs are Pydantic models and can be created through EnvironmentFactory.

  • Find & Return (FindReturnEnv, FindReturnConfig)
    Search for goal tiles in a procedurally-generated map. When an agent finds a flag it is rewarded and respawned elsewhere.

https://github.com/user-attachments/assets/98cc3318-67ca-44c0-ac6e-e4537bd30ed1

  • Scouts (ScoutsEnv, ScoutsConfig)
    Two roles: Harvesters “unlock” treasure tiles; Scouts collect unlocked treasures.

https://github.com/user-attachments/assets/d566840e-1837-4fc1-8c78-439677f358a8

  • Traveling Salesman (TravelingSalesmanEnv, TravelingSalesmanConfig)
    Multiple flags are scattered. Each agent is rewarded the first time it reaches each flag; flags reset after all are collected.

https://github.com/user-attachments/assets/af009d24-c65e-4195-99af-0a4e703652cd

  • King of the Hill (KingHillEnv, KingHillConfig)
    Two-team competitive environment with knights/archers, destructible walls, control points, and team-shared rewards.

https://github.com/user-attachments/assets/3483745f-7c53-46e9-b838-3cc76b9e3ee4

Wrappers

  • VectorWrapper(env, vec_count)
    Runs vec_count independent copies of an environment via jax.vmap and flattens the (vec, agent) dimensions into a single agent dimension.

  • MultiTaskWrapper((env1, env2, ...), (name1, name2, ...))
    Combines multiple environments into one by concatenating their agents. Adds a per-agent task_ids field to the TimeStep via TaskIdWrapper.

The EnvironmentFactory also supports a MultiTaskConfig that can build a multitask environment (and optionally vectorize each task).

Acknowledgements

Rendering uses the Urizen Onebit Tileset by Vurmux.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapox-0.1.1.tar.gz (302.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mapox-0.1.1-py3-none-any.whl (277.4 kB view details)

Uploaded Python 3

File details

Details for the file mapox-0.1.1.tar.gz.

File metadata

  • Download URL: mapox-0.1.1.tar.gz
  • Upload date:
  • Size: 302.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"TUXEDO OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mapox-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dce84e112af5f314ae2b30b081e3400502f489269b4956845e44971b44a639c7
MD5 e778bfc442696ae41e10cc9ff22c771e
BLAKE2b-256 9106a535c90a94d99d85826ff3eef5956104d39f45a17c6de257a309917d55e4

See more details on using hashes here.

File details

Details for the file mapox-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mapox-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 277.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"TUXEDO OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mapox-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b6b3e4a2fed08ca3bb9e1b0faaa7e623d41806f03905baf8fc9ec141da092eb
MD5 433ce9cc7b9916530a2fc8219268959a
BLAKE2b-256 9097550941639fc62af8e4ee90667d81760dc432751ead843b30637592f70d01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page