Skip to main content

Multi-Agent Partially Observable gridworlds in JAX

Project description

MAPOX

MAPOX is a small collection of JAX-native, multi-agent, partially-observable gridworld environments with a shared observation/action format and a simple pygame renderer.

The environments are functional (state in / state out), designed to work well with jax.jit/jax.vmap, and expose an action_mask for per-environment action subsets.

These environments were originally devoloped as part of https://github.com/gabe00122/jaxrl/tree/mapox and this might be a good example for trained polices.

Installation

MAPOX requires Python 3.11+.

uv add mapox

Notes:

  • Video export uses python-ffmpeg and requires the ffmpeg binary available on your system PATH.

Quick start

Environments implement a small interface:

  • reset(rng_key) -> (state, timestep)
  • step(state, actions, rng_key) -> (state, timestep)
  • timestep.obs: (num_agents, view_w, view_h, 4) int8
  • timestep.action_mask: (num_agents, num_actions) bool
import jax
import jax.numpy as jnp
from mapox import EnvironmentFactory, FindReturnConfig

factory = EnvironmentFactory()
env, _ = factory.create_env(FindReturnConfig(num_agents=2), length=512)

rng = jax.random.PRNGKey(0)
state, ts = env.reset(rng)

# Sample a random *legal* action per agent using the action mask
rng, akey, skey = jax.random.split(rng, 3)
logits = jax.random.uniform(akey, (env.num_agents, env.action_spec.n))
actions = jnp.argmax(jnp.where(ts.action_mask, logits, -1e9), axis=-1)

state, ts = env.step(state, actions, skey)
print(ts.reward, ts.terminated)

One-hot encoding observations

Observations are compact categorical channels. You can expand them into one-hot features:

from mapox import concat_one_hot

# sizes is typically (NUM_TILE_TYPES, 5, 3, 3)
sizes = env.observation_spec.max_value
x = concat_one_hot(ts.obs, sizes)  # (..., sum(sizes))

Interactive play / rendering

A simple interactive runner is provided:

uv run -m mapox.play --env king_hill

Controls (depending on the environment):

  • Movement: WASD or arrow keys
  • n: cycle focused agent

The renderer can show the full map or the focused agent’s POV (a local crop). See mapox/play.py for an example of using GridworldClient and recording video.

Observation & action format

All environments share a unified discrete encoding defined in mapox/envs/constance.py.

Actions

The action space is always DiscreteActionSpec(n=7) with IDs:

id action
0 move up
1 move right
2 move down
3 move left
4 stay
5 primary action
6 dig action

Not every environment uses every action. Always consult timestep.action_mask before sampling.

Observation

Each agent receives a local crop centered on itself: (view_width, view_height, 4) with channels:

  1. tile_id (terrain + agent types)
  2. direction (0 = none, 1..4 = cardinal direction)
  3. team_id (0 = none/neutral, 1 = red, 2 = blue)
  4. health (0..2)

Environments

All environment configs are Pydantic models and can be created through EnvironmentFactory.

  • Find & Return (FindReturnEnv, FindReturnConfig)
    Search for goal tiles in a procedurally-generated map. When an agent finds a flag it is rewarded and respawned elsewhere.

https://github.com/user-attachments/assets/98cc3318-67ca-44c0-ac6e-e4537bd30ed1

  • Scouts (ScoutsEnv, ScoutsConfig)
    Two roles: Harvesters “unlock” treasure tiles; Scouts collect unlocked treasures.

https://github.com/user-attachments/assets/d566840e-1837-4fc1-8c78-439677f358a8

  • Traveling Salesman (TravelingSalesmanEnv, TravelingSalesmanConfig)
    Multiple flags are scattered. Each agent is rewarded the first time it reaches each flag; flags reset after all are collected.

https://github.com/user-attachments/assets/af009d24-c65e-4195-99af-0a4e703652cd

  • King of the Hill (KingHillEnv, KingHillConfig)
    Two-team competitive environment with knights/archers, destructible walls, control points, and team-shared rewards.

https://github.com/user-attachments/assets/3483745f-7c53-46e9-b838-3cc76b9e3ee4

Wrappers

  • VectorWrapper(env, vec_count)
    Runs vec_count independent copies of an environment via jax.vmap and flattens the (vec, agent) dimensions into a single agent dimension.

  • MultiTaskWrapper((env1, env2, ...), (name1, name2, ...))
    Combines multiple environments into one by concatenating their agents. Adds a per-agent task_ids field to the TimeStep via TaskIdWrapper.

The EnvironmentFactory also supports a MultiTaskConfig that can build a multitask environment (and optionally vectorize each task).

Acknowledgements

Rendering uses the Urizen Onebit Tileset by Vurmux.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapox-0.2.0.tar.gz (309.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mapox-0.2.0-py3-none-any.whl (281.4 kB view details)

Uploaded Python 3

File details

Details for the file mapox-0.2.0.tar.gz.

File metadata

  • Download URL: mapox-0.2.0.tar.gz
  • Upload date:
  • Size: 309.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"TUXEDO OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mapox-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b3c6fecefbc2c8541c36d339085005f4dac589b96d1099a11a6fd982dfda3783
MD5 08872225f3af51ff68b5b4bab1c3fe16
BLAKE2b-256 0f766fbbab7b59f44c73f5cb0334a56817bc50ce6ea8bd579f6c1f5ee1d086ee

See more details on using hashes here.

File details

Details for the file mapox-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mapox-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 281.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"TUXEDO OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mapox-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61cb1aa2660316cae40ed008fe8f4c719ba35689ae0715a8f28fbda34081f6cb
MD5 1496db77799f23009e0e28d6ccb6ee6e
BLAKE2b-256 b8e2fba65717204fc58361c18a705b04fd9caa2caba98c03ca5e255ee1b6eb41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page