Skip to main content

An API for JAX-based Reinforcement Learning Environments

Project description

Vamos

Python License

Vamos is a JAX-native Reinforcement Learning environment API designed for high-performance parallel execution with a Gymnasium-like interface rebuilt from the ground up to leverage JAX's functional programming paradigm and automatic vectorization.

Key Features

  • Stateless, Functional Design: Unlike Gymnasium where state is stored internally, Vamos passes state explicitly as function parameters. This enables seamless composition with JAX transformations (jit, vmap, grad).

  • Gymnasium-Familiar API: If you know Gymnasium, you'll feel at home. Vamos uses similar concepts (spaces, wrappers, step/reset) adapted for JAX's functional style. Builtin is many of the popular Gymnasium environments, wrappers, and make, which is highly extensible.

Installation

pip install vamos-rl

Quick Start

import jax
import vamos

env, params = vamos.make("CartPole-v1")

# Initialize
rng = jax.random.PRNGKey(0)
timestep, state = env.reset(params, rng)
# the timestep is a dataclass containing your step data (observation, reward, etc)

# Take a step
action = env.action_space.sample(rng)
timestep, state = env.step(state, action, params, rng)

print(f"Observation: {timestep.obs}")
print(f"Reward: {timestep.reward}")
print(f"Episode Over: {timestep.episode_over}")  # this is equal to computing `termination or truncation`

Vectorized Environments

Run multiple environments in parallel with VMapVectorEnv:

import jax
import vamos

vec_env, params = vamos.make_vec("CartPole-v1", num_envs=1024)

rng = jax.random.PRNGKey(0)
timestep, state = vec_env.reset(params, rng)  # Get the reset observation and state for all 1024 environments

# Step all 1024 environments simultaneously
actions = vec_env.action_space.sample(rng)  # Shape: (1024,)
timestep, state = vec_env.step(state, actions, params, rng)

Vamos offers three strategies to optimize automatically reset sub-environments when episodes end:

  • COMPLETE: Generate N reset states every step (maximum diversity)
  • OPTIMISTIC: Generate M << N states, reuse when needed (balanced)
  • PRECOMPUTED: Pre-generate a pool before training (zero overhead)

See the vector environment documentation for details on autoreset modes and strategies.

Gymnasium vs Vamos

Aspect Gymnasium Vamos
State management Internal (mutable) Explicit (functional)
Vectorization SyncVectorEnv (Python loops) vmap (hardware-accelerated)
JIT compilation Not supported Native support
Autodiff through env Not possible Supported via JAX
Parallelism Process-based Array-based (GPU/TPU)
Randomness Modifiable at Episode Resets Selectable at every timestep

Gymnasium style (stateful):

obs, info = env.reset()
obs, reward, term, trunc, info = env.step(action)

Vamos style (functional):

timestep, state = env.reset(params, rng)
timestep, state = env.step(state, action, params, rng)

Core Concepts

Timestep

All environment outputs are bundled in a Timestep dataclass:

@struct.dataclass
class Timestep:
    obs: ArrayTree          # Current observation
    reward: float           # Reward from last action
    termination: bool       # Episode ended (goal/failure)
    truncation: bool        # Episode cut off (time limit)
    info: dict              # Additional information

    @property
    def episode_over(self):
        return self.termination or self.truncation

Spaces

Define valid actions and observations: Vamos supports a significantly more limited set of spaces, just three Scalar for individual values like a Discrete set of actions, Array for a vector or matrix of data like an image and Dict for composing multiple spaces together.

from vamos.spaces import Scalar, Array, Dict

# Discrete action (0, 1, 2, 3, 4)
action_space = Scalar(5)

# Continuous bounded values
obs_space = Array(low=[-1.0, -1.0], high=[1.0, 1.0])

# Composite spaces
space = Dict({"position": Array(...), "velocity": Array(...)})

Wrappers

Compose environment modifications:

from vamos.wrappers.time_limit import TimeLimit

env, params = CartPoleEnv.new()
env, params = TimeLimit.wrap(env, params, max_episode_steps=500)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vamos_rl-0.1.0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vamos_rl-0.1.0-py3-none-any.whl (35.6 kB view details)

Uploaded Python 3

File details

Details for the file vamos_rl-0.1.0.tar.gz.

File metadata

  • Download URL: vamos_rl-0.1.0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for vamos_rl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c41fdd779c0f711a837e871b05ea7ea91b617180c6947c3b93a59db64fec5408
MD5 cb99d3d5f5b211a5de6c4ad0e4a1ac59
BLAKE2b-256 9930e98f417075a0d8a4b41f5d5bf745180f4898cd26c88f181e0799bf9a4eee

See more details on using hashes here.

File details

Details for the file vamos_rl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vamos_rl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for vamos_rl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49b2bcfe55cc9210a2bf609f1112b4570c80d4f4ea12022a6f3053b9f1911009
MD5 41b38939c96a0cf0a5d7d7a141fa4c84
BLAKE2b-256 1e38c7c25c44b1a83b903e14fc60c3773782d3b48fd2f3d5415f44ee14a3d1b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page