An API for JAX-based Reinforcement Learning Environments
Project description
Vamos
Vamos is a JAX-native Reinforcement Learning environment API designed for high-performance parallel execution with a Gymnasium-like interface rebuilt from the ground up to leverage JAX's functional programming paradigm and automatic vectorization.
Key Features
-
Stateless, Functional Design: Unlike Gymnasium where state is stored internally, Vamos passes state explicitly as function parameters. This enables seamless composition with JAX transformations (
jit,vmap,grad). -
Gymnasium-Familiar API: If you know Gymnasium, you'll feel at home. Vamos uses similar concepts (spaces, wrappers, step/reset) adapted for JAX's functional style. Builtin is many of the popular Gymnasium environments, wrappers, and make, which is highly extensible.
Installation
pip install vamos-rl
Quick Start
import jax
import vamos
env, params = vamos.make("CartPole-v1")
# Initialize
rng = jax.random.PRNGKey(0)
timestep, state = env.reset(params, rng)
# the timestep is a dataclass containing your step data (observation, reward, etc)
# Take a step
action = env.action_space.sample(rng)
timestep, state = env.step(state, action, params, rng)
print(f"Observation: {timestep.obs}")
print(f"Reward: {timestep.reward}")
print(f"Episode Over: {timestep.episode_over}") # this is equal to computing `termination or truncation`
Vectorized Environments
Run multiple environments in parallel with VMapVectorEnv:
import jax
import vamos
vec_env, params = vamos.make_vec("CartPole-v1", num_envs=1024)
rng = jax.random.PRNGKey(0)
timestep, state = vec_env.reset(params, rng) # Get the reset observation and state for all 1024 environments
# Step all 1024 environments simultaneously
actions = vec_env.action_space.sample(rng) # Shape: (1024,)
timestep, state = vec_env.step(state, actions, params, rng)
Vamos offers three strategies to optimize automatically reset sub-environments when episodes end:
- COMPLETE: Generate N reset states every step (maximum diversity)
- OPTIMISTIC: Generate M << N states, reuse when needed (balanced)
- PRECOMPUTED: Pre-generate a pool before training (zero overhead)
See the vector environment documentation for details on autoreset modes and strategies.
Gymnasium vs Vamos
| Aspect | Gymnasium | Vamos |
|---|---|---|
| State management | Internal (mutable) | Explicit (functional) |
| Vectorization | SyncVectorEnv (Python loops) |
vmap (hardware-accelerated) |
| JIT compilation | Not supported | Native support |
| Autodiff through env | Not possible | Supported via JAX |
| Parallelism | Process-based | Array-based (GPU/TPU) |
| Randomness | Modifiable at Episode Resets | Selectable at every timestep |
Gymnasium style (stateful):
obs, info = env.reset()
obs, reward, term, trunc, info = env.step(action)
Vamos style (functional):
timestep, state = env.reset(params, rng)
timestep, state = env.step(state, action, params, rng)
Core Concepts
Timestep
All environment outputs are bundled in a Timestep dataclass:
@struct.dataclass
class Timestep:
obs: ArrayTree # Current observation
reward: float # Reward from last action
termination: bool # Episode ended (goal/failure)
truncation: bool # Episode cut off (time limit)
info: dict # Additional information
@property
def episode_over(self):
return self.termination or self.truncation
Spaces
Define valid actions and observations:
Vamos supports a significantly more limited set of spaces, just three Scalar for individual values like a Discrete set of actions, Array for a vector or matrix of data like an image and Dict for composing multiple spaces together.
from vamos.spaces import Scalar, Array, Dict
# Discrete action (0, 1, 2, 3, 4)
action_space = Scalar(5)
# Continuous bounded values
obs_space = Array(low=[-1.0, -1.0], high=[1.0, 1.0])
# Composite spaces
space = Dict({"position": Array(...), "velocity": Array(...)})
Wrappers
Compose environment modifications:
from vamos.wrappers.time_limit import TimeLimit
env, params = CartPoleEnv.new()
env, params = TimeLimit.wrap(env, params, max_episode_steps=500)
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vamos_rl-0.1.0.tar.gz.
File metadata
- Download URL: vamos_rl-0.1.0.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c41fdd779c0f711a837e871b05ea7ea91b617180c6947c3b93a59db64fec5408
|
|
| MD5 |
cb99d3d5f5b211a5de6c4ad0e4a1ac59
|
|
| BLAKE2b-256 |
9930e98f417075a0d8a4b41f5d5bf745180f4898cd26c88f181e0799bf9a4eee
|
File details
Details for the file vamos_rl-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vamos_rl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49b2bcfe55cc9210a2bf609f1112b4570c80d4f4ea12022a6f3053b9f1911009
|
|
| MD5 |
41b38939c96a0cf0a5d7d7a141fa4c84
|
|
| BLAKE2b-256 |
1e38c7c25c44b1a83b903e14fc60c3773782d3b48fd2f3d5415f44ee14a3d1b6
|