RT-1, RT-1-X, Octo Robotics Transformer Model Inference

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Library for Robotic Transformers. RT-1, RT-X-1, Octo

Available Models

Model Type	Variants	Observation Space	Action Space	Author
RT-1	rt1main, rt1multirobot, rt1simreal	text + head camera	end effector pose delta	Google Research, 2022
RT-1-X	rt1x	text + head camera	end effector pose delta	Google Research et al., 2023
Octo	octo-base, octo-small	text + head camera + Optional[wrist camera]	end effector pose delta	Octo Model Team et al., 2023

Installation

Requirements: python >= 3.9

From Source

Clone this repo:

git clone https://github.com/sebbyjp/mbodied.git

Install requirements:

python -m pip install --upgrade pip

cd mbodied && pip install -r requirements.txt

Run Octo inference on demo images

python -m mbodied.demo

Run RT-1 Inference On Demo Images

python -m mbodied.models.rt1.inference

See usage

You can specify a custom checkpoint path or the model_keys for the three mentioned in the RT-1 paper as well as RT-X.

python -m mbodied.models.rt1.inference --help

Run Inference Server

The inference server takes care of all the internal state so all you need to specify is an instruction and image.

from mbodied.inference_server import InferenceServer
import numpy as np

# Somewhere in your robot control stack code...

instruction = "pick block"
img = np.random.randn(256, 320, 3) # Width, Height, RGB
inference = InferenceServer()

action = inference(instruction, img)

Data Types

action, next_policy_state = model.act(time_step, curr_policy_state)

policy state is internal state of network

In this case it is a 6-frame window of past observations,actions and the index in time.

{'action_tokens': ArraySpec(shape=(6, 11, 1, 1), dtype=dtype('int32'), name='action_tokens'),
 'image': ArraySpec(shape=(6, 256, 320, 3), dtype=dtype('uint8'), name='image'),
 'step_num': ArraySpec(shape=(1, 1, 1, 1), dtype=dtype('int32'), name='step_num'),
 't': ArraySpec(shape=(1, 1, 1, 1), dtype=dtype('int32'), name='t')}

time_step is the input from the environment

{'discount': BoundedArraySpec(shape=(), dtype=dtype('float32'), name='discount', minimum=0.0, maximum=1.0),
 'observation': {'base_pose_tool_reached': ArraySpec(shape=(7,), dtype=dtype('float32'), name='base_pose_tool_reached'),
                 'gripper_closed': ArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closed'),
                 'gripper_closedness_commanded': ArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closedness_commanded'),
                 'height_to_bottom': ArraySpec(shape=(1,), dtype=dtype('float32'), name='height_to_bottom'),
                 'image': ArraySpec(shape=(256, 320, 3), dtype=dtype('uint8'), name='image'),
                 'natural_language_embedding': ArraySpec(shape=(512,), dtype=dtype('float32'), name='natural_language_embedding'),
                 'natural_language_instruction': ArraySpec(shape=(), dtype=dtype('O'), name='natural_language_instruction'),
                 'orientation_box': ArraySpec(shape=(2, 3), dtype=dtype('float32'), name='orientation_box'),
                 'orientation_start': ArraySpec(shape=(4,), dtype=dtype('float32'), name='orientation_in_camera_space'),
                 'robot_orientation_positions_box': ArraySpec(shape=(3, 3), dtype=dtype('float32'), name='robot_orientation_positions_box'),
                 'rotation_delta_to_go': ArraySpec(shape=(3,), dtype=dtype('float32'), name='rotation_delta_to_go'),
                 'src_rotation': ArraySpec(shape=(4,), dtype=dtype('float32'), name='transform_camera_robot'),
                 'vector_to_go': ArraySpec(shape=(3,), dtype=dtype('float32'), name='vector_to_go'),
                 'workspace_bounds': ArraySpec(shape=(3, 3), dtype=dtype('float32'), name='workspace_bounds')},
 'reward': ArraySpec(shape=(), dtype=dtype('float32'), name='reward'),
 'step_type': ArraySpec(shape=(), dtype=dtype('int32'), name='step_type')}

action

{'base_displacement_vector': BoundedArraySpec(shape=(2,), dtype=dtype('float32'), name='base_displacement_vector', minimum=-1.0, maximum=1.0),
 'base_displacement_vertical_rotation': BoundedArraySpec(shape=(1,), dtype=dtype('float32'), name='base_displacement_vertical_rotation', minimum=-3.1415927410125732, maximum=3.1415927410125732),
 'gripper_closedness_action': BoundedArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closedness_action', minimum=-1.0, maximum=1.0),
 'rotation_delta': BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='rotation_delta', minimum=-1.5707963705062866, maximum=1.5707963705062866),
 'terminate_episode': BoundedArraySpec(shape=(3,), dtype=dtype('int32'), name='terminate_episode', minimum=0, maximum=1),
 'world_vector': BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='world_vector', minimum=-1.0, maximum=1.0)}

TODO

Render action, policy_state, observation specs in something prettier like pandas data frame.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.0

Jun 27, 2024

0.0.6

Jun 24, 2024

This version

0.0.1

Mar 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbodied-0.0.1.tar.gz (90.4 MB view hashes)

Uploaded Mar 4, 2024 Source

Built Distribution

mbodied-0.0.1-py3-none-any.whl (90.4 MB view hashes)

Uploaded Mar 4, 2024 Python 3

Hashes for mbodied-0.0.1.tar.gz

Hashes for mbodied-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`3ca75a9f4b8c65bec81e8bc44ba3b042d7b418eca9cf92df721be03e9b13ca06`
MD5	`de660fb861fd5344180a8ba648733d2c`
BLAKE2b-256	`1d478457216064192dd2b59e544aad8eef588f5eca7582d0f88f32d1776d806a`

Hashes for mbodied-0.0.1-py3-none-any.whl

Hashes for mbodied-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`14ba2a5c618d13c0bb10863d63c0288dd4dd0a795181ac2373fcc9a53e7f1031`
MD5	`35111e8b516e9a8654b7b1c3e744aa97`
BLAKE2b-256	`df048e1a6a6da4fb751f50939836924948c1320b356af797396b7bd815d20f15`