Gym-like memory-intensive environmtnts for robotic tabletop manipulation

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

MIKASA-Robo

Benchmark for robotic tabletop manipulation memory-intensive tasks

Example tasks from the MIKASA-Robo benchmark

Overview

MIKASA-Robo is a comprehensive benchmark suite for memory-intensive robotic manipulation tasks, part of the MIKASA (Memory-Intensive Skills Assessment Suite for Agents) framework. It features:

12 distinct task types with varying difficulty levels
32 total tasks covering different memory aspects
First benchmark specifically designed for testing agent memory in robotic manipulation

Key Features

Diverse Memory Testing: Covers four fundamental memory types:
- Object Memory
- Spatial Memory
- Sequential Memory
- Memory Capacity
Built on ManiSkill3: Leverages the powerful ManiSkill3 framework, providing:
- GPU parallelization
- User-friendly interface
- Customizable environments

List of Tasks

Memory Task	Mode	Brief Description	T	Memory Task Type
`ShellGame[Mode]-v0`	`Touch` `Push` `Pick`	Memorize the position of the ball after some time being covered by the cups and then interact with the cup the ball is under.	90	Object
`Intercept[Mode]-v0`	`Slow` `Medium` `Fast`	Memorize the positions of the rolling ball, estimate its velocity through those positions, and then aim the ball at the target.	90	Spatial
`InterceptGrab[Mode]-v0`	`Slow` `Medium` `Fast`	Memorize the positions of the rolling ball, estimate its velocity through those positions, and then catch the ball with the gripper and lift it up.	90	Spatial
`RotateLenient[Mode]-v0`	`Pos` `PosNeg`	Memorize the initial position of the peg and rotate it by a given angle.	90	Spatial
`RotateStrict[Mode]-v0`	`Pos` `PosNeg`	Memorize the initial position of the peg and rotate it to a given angle without shifting its center.	90	Object
`TakeItBack-v0`	---	Memorize the initial position of the cube, move it to the target region, and then return it to its initial position.	180	Spatial
`RememberColor[Mode]-v0`	`3`/`5`/`9`	Memorize the color of the cube and choose among other colors.	60	Object
`RememberShape[Mode]-v0`	`3`/`5`/`9`	Memorize the shape of the cube and choose among other shapes.	60	Object
`RememberShapeAndColor[Mode]-v0`	`3×2`/`3×3` `5×3`	Memorize the shape and color of the cube and choose among other shapes and colors.	60	Object
`BunchOfColors[Mode]-v0`	`3`/`5`/`7`	Remember the colors of the set of cubes shown simultaneously in the bunch and touch them in any order.	120	Capacity
`SeqOfColors[Mode]-v0`	`3`/`5`/`7`	Remember the colors of the set of cubes shown sequentially and then select them in any order.	120	Capacity
`ChainOfColors[Mode]-v0`	`3`/`5`/`7`	Remember the colors of the set of cubes shown sequentially and then select them in the same order.	120	Sequential

Total: 32 tabletop robotic manipulation memory-intensive tasks in 12 groups. T - episode timeout.

Quick Start

Installation

git clone git@github.com:CognitiveAISystems/MIKASA-Robo.git
cd MIKASA-Robo
pip install -r requirements.txt

Basic Usage

import mikasa_robo
from mikasa_robo_suite.utils.wrappers import *

num_envs, seed = 512, 123
# Create the environment via gym.make()
# obs_mode="rgb" for modes "RGB", "RGB+joint", "RGB+oracle" etc.
# obs_mode="state" for mode "state"
env = gym.make("RememberColor9-v0", num_envs=num_envs,
                obs_mode="rgb", render_mode="all")

env = StateOnlyTensorToDictWrapper(env) # [always] gen. obs keys

obs, _ = env.reset(seed)
for i in tqdm(range(89)):
    action = torch.from_numpy(env.action_space.sample())
    obs, reward, terminated, truncated, info = env.step(action)
env.close()

Advanced Usage: Debug Wrappers

import mikasa_robo
from mikasa_robo_suite.utils.wrappers import *
from mani_skill.utils.wrappers import RecordEpisode

num_envs, seed = 512, 123
env = gym.make("RememberColor9-v0", num_envs=num_envs,
obs_mode="rgb", render_mode="all")

env = StateOnlyTensorToDictWrapper(env) # [always] gen. obs keys
env = RememberColorInfoWrapper(env) # [debug] show task info
env = RenderStepInfoWrapper(env) # [debug] show env step
env = RenderRewardInfoWrapper(env) # [debug] show total reward
env = DebugRewardWrapper(env) # [debug] show reward info
env = RecordEpisode(env, "./videos/demo_remember-color-9")

obs, _ = env.reset(seed)
for i in tqdm(range(89)):
    action = torch.from_numpy(env.action_space.sample())
    obs, reward, terminated, truncated, info = env.step(action)
env.close()

Video("./videos/demo_remember-color-9/0.mp4", embed=True)

Training

MIKASA-Robo supports multiple training configurations:

PPO with MLP (State-Based)

python3 baselines/ppo/ppo_memtasks.py \
    --env_id=RememberColor9-v0 \
    --exp-name=remember-color-9-v0 \
    --num-steps=60 \
    --num_eval_steps=180 \
    --include-state

PPO with MLP (RGB + Joint)

python3 baselines/ppo/ppo_memtasks.py \
    --env_id=RememberColor9-v0 \
    --exp-name=remember-color-9-v0 \
    --num-steps=60 \
    --num_eval_steps=180 \
    --include-rgb \
    --include-joints

PPO with LSTM (RGB + Joint)

python3 baselines/ppo/ppo_memtasks_lstm.py \
    --env_id=RememberColor9-v0 \
    --exp-name=remember-color-9-v0 \
    --num-steps=60 \
    --num_eval_steps=180 \
    --include-rgb \
    --include-joints

To train with sparse rewards, add --reward-mode=sparse.

MIKASA-Robo Ideology

The agent's memory capabilities can be accessed not only when the environment demands memory, but also when the observations are provided in the correct format. Currently, we have implemented several training modes:

state: In this mode, the agent receives comprehensive, vectorized information about the environment, joints, and TCP pose, along with oracle data that is essential for solving memory-intensive tasks. When trained in this way, the agent addresses the MDP problem and does not require memory.
RGB+joints: Here, the agent receives image data from a camera mounted above and from the manipulator's gripper, along with the position and velocity of its joints. This mode provides no additional information, meaning the agent must learn to store and utilize oracle data. It is designed to test the agent's memory capabilities.

These training modes are obtained by using correct flags. Thus,

# To train in `state` mode:
--include-state

# To train in `RGB+joints` mode:
--include-rgb \
--include-joints

# Additionally, for debugging you can add oracle information to the observation:
--include-oracle

Collecting datasets for Offline RL

Run training PPO-MLP on MIKASA-Robo tasks in the state mode (i.e. in MDP mode with oracle information):

# For single task:
python3 dataset_collectors/get_dataset_collectors_ckpt.py --env_id=ShellGameTouch-v0

# For all tasks:
python3dataset_collectors/parallel_training_manager.py

Collect datasets using oracle checkpoints:

# For single task:
python3 dataset_collectors/get_mikasa_robo_datasets.py --env-id=ShellGameTouch-v0 --path-to-save-data="data" --ckpt-dir="."

# For all tasks:
python3 dataset_collectors/parallel_dataset_collection_manager.py --path-to-save-data="data" --ckpt-dir="."

Citation

If you find our work useful, please cite our paper:

@misc{cherepanov2025mikasa,
      title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning}, 
      author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},
      year={2025},
      eprint={2502.10550},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.10550}, 
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.5

Apr 10, 2025

0.0.4

Mar 18, 2025

0.0.3

Mar 13, 2025

This version

0.0.2

Mar 13, 2025

0.0.1

Mar 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mikasa_robo_suite-0.0.2.tar.gz (77.2 kB view details)

Uploaded Mar 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mikasa_robo_suite-0.0.2-py3-none-any.whl (112.0 kB view details)

Uploaded Mar 13, 2025 Python 3

File details

Details for the file mikasa_robo_suite-0.0.2.tar.gz.

File metadata

Download URL: mikasa_robo_suite-0.0.2.tar.gz
Upload date: Mar 13, 2025
Size: 77.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.20

File hashes

Hashes for mikasa_robo_suite-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d1d7726897343ff6673ab18952f49f510cedc7c814557ca6aa55c40fc05aae4c`
MD5	`2ec227b48ecdaa50370a518ccc62b580`
BLAKE2b-256	`b7276ec11c012ce614a9311dc535d64b4796cf8d2e1ebb09387ef81b7f7c8fbd`

See more details on using hashes here.

File details

Details for the file mikasa_robo_suite-0.0.2-py3-none-any.whl.

File metadata

Download URL: mikasa_robo_suite-0.0.2-py3-none-any.whl
Upload date: Mar 13, 2025
Size: 112.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.20

File hashes

Hashes for mikasa_robo_suite-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8affa3f5dca568aa2495d9e0058746d37f8e11a3bbe3a7efdc90e487b99b565b`
MD5	`0a6ecd96107f4e50ccc809309d59fe6d`
BLAKE2b-256	`7b133aa46bb91f40f6723182ddd5ef70fc5f899904831db3a33047832722f4d6`

See more details on using hashes here.

mikasa-robo-suite 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MIKASA-Robo

Benchmark for robotic tabletop manipulation memory-intensive tasks

Overview

Key Features

List of Tasks

Quick Start

Installation

Basic Usage

Advanced Usage: Debug Wrappers

Training

PPO with MLP (State-Based)

PPO with MLP (RGB + Joint)

PPO with LSTM (RGB + Joint)

MIKASA-Robo Ideology

Collecting datasets for Offline RL

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes