Active Perception Gym: extension of Gymnasium for active perception tasks.
Project description
ap_gym: Active Perception Gym
Extension of Gymnasium for active perception tasks.
Installation
This package can be installed using pip:
pip install ap_gym[OPTIONS]
where OPTIONS can be empty or examples, which installs dependencies for the examples.
Basic Usage
ap_gym adds functionality for active perception tasks to Gymnasium. This guide assumes that you are familiar with Gymnasium, otherwise, please check out their documentation.
Active Perception
In the active perception domain, an agent's main objective is to gather information and make predictions about a desired property of the environment. Examples of such properties could be the location of an object in case of a search task or the class of an object the agent in case of a classification task. To gather information, the agent must interact with the environment, e.g. by moving a glimpse around in case of the CircleSquare and MNIST tasks.
ap_gym models active perception tasks as episodic processes in a way that is fully compatible to Gymnasium. Each task is defined as a Gymnasium environment, in which the agent is additionally provided with a differentiable loss function. The purpose of the loss function is to provide the agent with a generalizable notion of the distance between its current property prediction and the ground truth property.
In every episode, the agent may take a task-dependent number of steps to gather information. Just like in Gymnasium, in every step the environment provides the agent with an observation, typically consisting of scalar and/or image data. In return, the agent must provide the environment with an action and a property prediction in every step. Based on the action and prediction of the agent, the environment computes a reward in every step, which is the sum of a regular RL reward (the base reward) and the negative value of the environment's loss function. Hence, the agent has to make a prediction in every step, encouraging it to gather information quickly to maximize its prediction reward early on.
Formal Problem Statement
Active perception problems are a special case of Partially Observable Markov Decision Processes (POMDPs). POMDPs are defined by the tuple $(S, A, T, R, \Omega, O, \gamma)$, where
- $S$ is the set of (hidden) states,
- $A$ is the set of actions,
- $T: S \times A \times S \to [0, 1]$ is the transition function,
- $R: S \times A \to \mathbb{R}$ is the reward function,
- $\Omega$ is the set of observations,
- $O: S \times A \times \Omega \to [0, 1]$ is the observation function, and
- $\gamma \in [0, 1]$ is the discount factor.
The objective of the agent in a POMDP is to maximize the expected cumulative reward over time by selecting actions based on its belief about the underlying state. Since the agent does not have direct access to the true state, it maintains a belief distribution over states, updating it using observations and the observation function. The environment evolves according to the transition function, where taking an action leads to a probabilistic transition to a new state, which in turn generates an observation based on the observation function. For further details on POMDPs, refer to the POMDP Wikipedia page.
In case of active perception problems, we assume that the hidden state $S$, the action $A$, the reward function $R$, and the transition function $T$ have specific structures. First, we assume that the target property the agent is tasked to predict is part of the hidden state. Hence, $S$ is defined as $S = S_{\text{base}} \times \overset{\ast}{Y}$, where $S_{\text{base}}$ is the set of base (hidden) states of the environment and $\overset{\ast}{Y}$ is the set of prediction targets. E.g., $\overset{\ast}{Y}$ could be the set of classes in a classification task or the set of possible locations in a localization task, while $S_{\text{base}}$ contains all the other hidden state information.
To allow the agent to make predictions, the action space $A$ is defined as $A_{\text{base}} \times Y$, where $A_{\text{base}}$ is the base action space and $Y$ is the prediction space. The base action space $A_{\text{base}}$ contains all the actions the agent can take to interact with the environment, while $Y$ is the set of possible predictions the agent can make. Crucially, environments are defined in a way that agent's prediction never influences the hidden state of the environment. Thus, the transition function $T$ is defined as $$T(s, a, s') = T(s, (a_\text{base}, y), s') = T_{\text{base}}(s, a_\text{base}, s').$$ An example for a base action could be the movement of a glimpse in an image classification task, while the prediction could be the logits of the agent's current class prediction.
Finally, the reward function is defined as $$R(s, a) = R((s_{\text{base}}, \overset{\ast}{y}), (a_{\text{base}}, y)) = R_{\text{base}}(s_{\text{base}}, a_{\text{base}}) - \ell(\overset{\ast}{y}, y),$$ where $R_{\text{base}}$ is the base reward function and $\ell$ is a differentiable loss function. An example for a base reward could be an action regularization term, while the loss function $\ell$ could be a cross-entropy loss in a classification task.
Environment Base Classes
Every task in ap_gym is modeled as a subclass of ap_gym.ActivePerceptionEnv or ap_gym.ActivePerceptionVectorEnv.
ap_gym.ActivePerceptionEnv and ap_gym.ActivePerceptionVectorEnv subclass gymnasium.Env and
gymnasium.vector.VectorEnv, respectively.
Both subclasses extend their Gymnasium interfaces by four fields:
loss_fn: The loss function of the environment. See Loss Functions.prediction_space: Agymnasium.spaces.Spacedefining the set of valid prediction values.prediction_target_space: Agymnasium.spaces.Spacedefining the set of valid prediction target values.inner_action_space: Agymnasium.spaces.Spacedefining the set of valid inner action values. Additionally,ap_gym.ActivePerceptionVectorEnvadds the respective single variants of the latter two fields:single_prediction_spaceandsingle_inner_action_space.
ap_gym.ActivePerceptionEnv and ap_gym.ActivePerceptionVectorEnv further enforce the agent's action space to be of
the following form:
{
"action": action,
"prediction": prediction
}
where the set of valid action values is defined by the inner_action_space field of the respective environment, and
the set of valid prediction values is defined by the prediction_space field.
The info dictionary returned by the reset and step functions always contains the current prediction target in
info["prediction"]["target"].
Additionally, the info dictionary returned by the step function contains the base reward (the reward without the
prediction loss) in info["base_reward"] and the prediction loss in info["prediction"]["loss"].
To get an understanding of how this class is used, refer to the examples in the examples directory and to the environments defined by ap_gym.
Loss Functions
The ap_gym.LossFn base class provides a differentiable implementation of the loss function for PyTorch and JAX.
ap_gym.LossFn has three functions: numpy, torch, and jax.
Each of these functions is the respective implementation of the loss function in Numpy, PyTorch, and JAX.
Note that only the PyTorch and JAX variant provide gradients as Numpy does not support autograd.
The signature of each framework-specific function is
def fn(
prediction: ArrayType, target: ArrayType, batch_shape: Tuple[int, ...] = ()
) -> ArrayType: ...
where ArrayType is one of np.ndarray, torch.Tensor, or jax.Array.
batch_shape is used to specify the batch dimensions in case of a batched evaluation of the loss function, e.g.:
loss = ap_gym.CrossEntropyLossFn()(
np.zeros((3, 7, 10)), np.zeros((3, 7), dtype=np.int_), (3, 7)
)
Representation of Image Observations
To help the agent differentiate between scalar and image observations, ap_gym introduces a new type of Gymnasium
space: ap_gym.ImageSpace.
ap_gym.ImageSpace is a subclass of gymnasium.spaces.Box with some image specific convenience properties like
width, height, and channels.
Its main purpose, though, is to let the agent know that it has to interpret this part of the observation space as an
image.
Using Gymnasium Wrappers
ap_gym provides a method for using regular Gymnasium wrappers on ap_gym.ActivePerceptionEnv and
ap_gym.ActivePerceptionVectorEnv instances.
The issue with using Gymnasium wrappers naively is that the special fields loss_fn, prediction_space,
prediction_target_space, and inner_action_space do not get mapped through.
Hence,
gymnasium.wrappers.TimeLimit(ap_gym.make("CircleSquare-v0"), 8).loss_fn
throws
AttributeError: 'TimeLimit' object has no attribute 'loss_fn'
To address this issue, ap_gym.ensure_active_perception_env and ap_gym.ensure_active_perception_vector_env can be
used:
ap_gym.ActivePerceptionRestoreWrapper(
gymnasium.wrappers.TimeLimit(ap_gym.make("CircleSquare-v0"), 8)
).loss_fn
ap_gym.ActivePerceptionRestoreWrapper and ap_gym.ActivePerceptionVectorRestoreWrapper recursively traverse wrappers
until they find an active perception environment and map the special fields through.
Additionally, aside of Gymnasium wrappers, ap_gym.ActivePerceptionVectorRestoreWrapper also supports
gymnasium.vector.SyncVectorEnv and gymnasium.vector.AsyncVectorEnv and will restore proper vector versions of all
spaces if active perception environments are vectorized this way.
Environments
ap_gym currently comes with three classes of environments: image classification, 2D localization, and image localization. Each class contains multiple environments of varying difficulty and complexity. To learn more about the environments, refer to their respective documentations linked below.
Image Classification
In this class of environments, the agent has to classify images into a set of classes. However, it does not have access to the entire image at once but rather has to move a small glimpse around to gather information. Find a detailed documentation of the image classification environments here.
|
CircleSquare-v0 |
MNIST-v0 |
TinyImageNet-v0 |
CIFAR10-v0 |
2D Localization
In 2D localization environments, the agent has to localize itself in a 2D environment. There are currently two types of 2D localization environments: a light-dark environment and LIDAR-based. In the light-dark environment, the agent must learn to navigate towards a light source to localize itself. In the LIDAR-based environments, the agent must localize itself using LIDAR sensor readings.
|
LightDark-v0 |
LIDARLocRooms-v0 |
LIDARLocMaze-v0 |
Image Localization
In image localization environments, the agent must localize a given glimpse in a natural image. Similar to the image classification class of tasks, agent must explore the image by moving a glimpse around. Find a detailed documentation of the image localization environments here.
|
TinyImageNetLoc-v0 |
CIFAR10Loc-v0 |
Converting Regular Gymnasium Environments to Active Perception Environments
It is possible to convert regular Gymnasium environments into a pseudo active perception environments with the
ap_gym.PseudoActivePerceptionWrapper and ap_gym.PseudoActivePerceptionVectorWrapper, respectively:
env = gymnasium.make("CartPole-v1")
ap_env = ap_gym.PseudoActivePerceptionWrapper(env)
ap_gym.PseudoActivePerceptionWrapper and ap_gym.PseudoActivePerceptionVectorWrapper take the environment and add a
constant zero loss function as well as empty prediction and prediction target spaces.
The purpose of this conversion is to simplify testing of ap_gym compatible algorithms on regular Gynmasium tasks.
If you want to support arbitrary Gymnasium and ap_gym environments, use the ap_gym.ensure_active_perception_env and
ap_gym.ensure_active_perception_vector_env functions:
ap_env_1 = ap_gym.ensure_active_perception_env(gymnasium.make("CartPole-v1"))
ap_env_2 = ap_gym.ensure_active_perception_env(ap_gym.make("CircleSquare-v0"))
ap_env_3 = ap_gym.ensure_active_perception_env(
gymnasium.wrappers.TimeLimit(ap_gym.make("CircleSquare-v0"), 8)
)
These functions automatically detect whether to do nothing, apply a restoration wrapper, or perform pseudo active perception environment conversion.
Advanced Usage
For more advanced usage, i.e., defining custom environments or wrappers, refer to the advanced usage documentation.
License
The project is licensed under the MIT license.
Contributing
If you wish to contribute to this project, you are welcome to create a pull request. Please run the pre-commit hooks before submitting your pull request. To install the pre-commit hooks, run:
- Install pre-commit
- Install the Git hooks by running
pre-commit installor, alternatively, run `pre-commit run --all-files manually.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ap_gym-0.3.0.tar.gz.
File metadata
- Download URL: ap_gym-0.3.0.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a540610e5e8bc12b49a554d2b78b2a19d7b2a154b6d897608af345d08c884f7b
|
|
| MD5 |
21c937193c88e50c10a77ce257b7bfa6
|
|
| BLAKE2b-256 |
596d665b55fd1b543b74b489bf2696d3f4e6c0bd66c5f27248a90604e3fdc3f6
|
Provenance
The following attestation bundles were made for ap_gym-0.3.0.tar.gz:
Publisher:
publish.yml on TimSchneider42/active-perception-gym
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ap_gym-0.3.0.tar.gz -
Subject digest:
a540610e5e8bc12b49a554d2b78b2a19d7b2a154b6d897608af345d08c884f7b - Sigstore transparency entry: 213766161
- Sigstore integration time:
-
Permalink:
TimSchneider42/active-perception-gym@f7485ba791fa74a59b28838d52bb6749c49e87bb -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/TimSchneider42
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f7485ba791fa74a59b28838d52bb6749c49e87bb -
Trigger Event:
push
-
Statement type:
File details
Details for the file ap_gym-0.3.0-py3-none-any.whl.
File metadata
- Download URL: ap_gym-0.3.0-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1efc56e3e68a2bb7e7bb85de762dc3f15f4bb3d6132ee2816b143af4d142699c
|
|
| MD5 |
b4341ebf1917ffcd2560347f3b84a404
|
|
| BLAKE2b-256 |
73c6495493f9b87a0a17c7ce07dc71fb64aff4c0fc6d94715ab44fe9f85f6494
|
Provenance
The following attestation bundles were made for ap_gym-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on TimSchneider42/active-perception-gym
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ap_gym-0.3.0-py3-none-any.whl -
Subject digest:
1efc56e3e68a2bb7e7bb85de762dc3f15f4bb3d6132ee2816b143af4d142699c - Sigstore transparency entry: 213766162
- Sigstore integration time:
-
Permalink:
TimSchneider42/active-perception-gym@f7485ba791fa74a59b28838d52bb6749c49e87bb -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/TimSchneider42
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f7485ba791fa74a59b28838d52bb6749c49e87bb -
Trigger Event:
push
-
Statement type: