Accelerated gridworld navigation with JAX for deep reinforcement learning

Project description

NAVIX: minigrid in JAX

PyPI version

Quickstart | Installation | Examples | Cite

What is NAVIX?

NAVIX is a JAX-powered reimplementation of minigrid. Key features:

Performance Boost: NAVIX offers a ~>1000x speed increase compared to the original Minigrid, enabling faster experimentation and scaling. You can see a preliminary performance comparison here.
XLA Compilation: Leverage the power of XLA to optimize NAVIX computations for your hardware (CPU, GPU, TPU).
Autograd Support: Differentiate through environment transitions, opening up new possibilities such as learned world models.

The library is in active development, and we are working on adding more environments and features. If you want join the development and contribute, please open a discussion and let's have a chat!

Installation

Install JAX

Follow the official installation guide for your OS and preferred accelerator: https://github.com/google/jax#installation.

Install NAVIX

pip install navix

Or, for the latest version from source:

pip install git+https://github.com/epignatelli/navix

Examples

Compiling a collection step

import jax
import navix as nx
import jax.numpy as jnp


def run(seed):
  env = nx.make('MiniGrid-Empty-8x8-v0') # Create the environment
  key = jax.random.PRNGKey(seed)
  timestep = env.reset(key)
  actions = jax.random.randint(key, (N_TIMESTEPS,), 0, env.action_space.n)

  def body_fun(timestep, action):
      timestep = env.step(action)  # Update the environment state
      return timestep, ()

  return jax.lax.scan(body_fun, timestep, actions)[0]

# Compile the entire training run for maximum performance
final_timestep = jax.jit(jax.vmap(run))(jnp.arange(1000))

Compiling a full training run

import jax
import navix as nx
import jax.numpy as jnp
from jax import random

def run_episode(seed, env, policy):
    """Simulates a single episode with a given policy"""
    key = random.PRNGKey(seed)
    timestep = env.reset(key)
    done = False
    total_reward = 0

    while not done:
        action = policy(timestep.observation)
        timestep, reward, done, _ = env.step(action)
        total_reward += reward

    return total_reward

def train_policy(policy, num_episodes):
    """Trains a policy over multiple parallel episodes"""
    envs = jax.vmap(nx.make, in_axes=0)(['MiniGrid-MultiRoom-N2-S4-v0'] * num_episodes)
    seeds = random.split(random.PRNGKey(0), num_episodes)

    # Compile the entire training loop with XLA
    compiled_episode = jax.jit(run_episode)
    compiled_train = jax.jit(jax.vmap(compiled_episode, in_axes=(0, 0, None)))

    for _ in range(num_episodes):
        rewards = compiled_train(seeds, envs, policy)
        # ... Update the policy based on rewards ...

# Hypothetical policy function
def policy(observation):
   # ... your policy logic ...
   return action

# Start the training
train_policy(policy, num_episodes=100)

Backpropagation through the environment

import jax
import navix as nx
import jax.numpy as jnp
from jax import grad
from flax import struct


class Model(struct.PyTreeNode):
  @nn.compact
  def __call__(self, x):
    # ... your NN here

model = Model()
env = nx.environments.Room(16, 16, 8)

def loss(params, timestep):
  action = jnp.asarray(0)
  pred_obs = model.apply(timestep.observation)
  timestep = env.step(timestep, action)
  return jnp.square(timestep.observation - pred_obs).mean()

key = jax.random.PRNGKey(0)
timestep = env.reset(key)
params = model.init(key, timestep.observation)

gradients = grad(loss)(params, timestep)

Join Us!

NAVIX is actively developed. If you'd like to contribute to this open-source project, we welcome your involvement! Start a discussion or open a pull request.

Cite

If you use navix please cite it as:

@misc{pignatelli2023navix,
  author = {Pignatelli, Eduardo},
  title = {Navix: Accelerated gridworld navigation with JAX},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/epignatelli/navix}}
  }

Project details

Release history Release notifications | RSS feed

0.7.0

Aug 1, 2024

0.6.19

Jul 16, 2024

0.6.18

Jul 8, 2024

0.6.17

Jul 8, 2024

0.6.16

Jul 8, 2024

0.6.15

Jul 5, 2024

0.6.14

Jul 4, 2024

0.6.13

Jul 4, 2024

0.6.12

Jun 26, 2024

0.6.11

Jun 21, 2024

0.6.10

Jun 21, 2024

0.6.9

Jun 20, 2024

0.6.8

Jun 20, 2024

0.6.7

Jun 19, 2024

0.6.6

Jun 13, 2024

0.6.5

Jun 11, 2024

0.6.4

Jun 10, 2024

0.6.3

Jun 8, 2024

0.6.2

Jun 7, 2024

0.6.1

Jun 6, 2024

This version

0.6.0

Jun 6, 2024

0.5.0

May 29, 2024

0.4.0

Mar 8, 2024

0.3.14

Jan 13, 2024

0.3.13

Jan 13, 2024

0.3.12

Sep 23, 2023

0.3.11

Sep 23, 2023

0.3.10

Aug 12, 2023

0.3.9

Jul 19, 2023

0.3.8

Jul 19, 2023

0.3.7

Jul 13, 2023

0.3.6

Jul 13, 2023

0.3.5

Jul 1, 2023

0.2.1

Jun 19, 2023

0.2.0

Jun 16, 2023

0.1.3

Jun 13, 2023

0.1.2

Jun 13, 2023

0.1.1

Jun 13, 2023

0.1.0

Jun 13, 2023

0.0.0

Jun 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

navix-0.6.0.tar.gz (68.5 kB view hashes)

Uploaded Jun 6, 2024 Source

Built Distribution

Navix-0.6.0-py2.py3-none-any.whl (86.2 kB view hashes)

Uploaded Jun 6, 2024 Python 2 Python 3

Hashes for navix-0.6.0.tar.gz

Hashes for navix-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`4816db651cf944447a64fe37cb65c4cbd100ae350e0bcdb71e5d3fd5ca81d6c9`
MD5	`a1fd8f19d8b9f256cf7b882b12bb4b08`
BLAKE2b-256	`af84fabe1e1589c2208c11804cd5be4f4b7e872c19aedf980f3063117af0bf5e`

Hashes for Navix-0.6.0-py2.py3-none-any.whl

Hashes for Navix-0.6.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`3303b5795692c327275fa19b4f546570359fdaf2b046230cf7bcf10afb2dfd23`
MD5	`fbc76324992fd66d3aaca336c770d41d`
BLAKE2b-256	`431fff731957ae5a6d1da8e2263b796f9927b8e5d1c09e0c9ce536189da0962f`