Embodied AI

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Mbodied Agents

Welcome to Mbodied Agents, a toolkit for integrating various state-of-the-art transformers into robotics stacks. Mbodied Agents is designed to provide a consistent interface for calling different AI models, handling multimodal data, using/creating datasets trained on different robots, and work for arbitrary observation and action spaces. See Getting Started.

Each time you interact with a robot, the data is automatically recorded into a dataset, which can be augmented and used for model training. We are actively developing tools for processing the dataset, augmenting the data, and finetuning foundation models. If you'd like to learn more or provide feedback, please fill out this form.◊

We welcome any questions, issues, or PRs!

Please join our Discord for interesting discussions! ⭐ Give us a star on GitHub if you like us!

Mbodied Agents

What is Mbodied Agents for

Mbodied Agents simplifies the integration of advanced AI models in robotics. It offers a unified platform for controlling various robots using state-of-the-art transformers and multimodal data processing. This toolkit enables experimentation with AI models, dataset collection and augmentation, and model training or finetuning for specific tasks. The goal is to develop intelligent, adaptable robots that learn from interactions and perform complex tasks in dynamic environments.

Overview

Mbodied Agents offers the following features:

Configurability : Define your desired Observation and Action spaces and read data into the format that works best for your system.
Natural Language Control : Use verbal prompts to correct a cognitive agent's actions and calibrate its behavior to a new environment.
Modularity : Easily swap out different backends, transformers, and hardware interfaces. For even better results, run multiple agents in separate threads.
Validation : Ensure that your data is in the correct format and that your actions are within the correct bounds before sending them to the robot.

Support Matrix

If you would like to integrate a new backend, sense, or motion control, it is very easy to do so. Please refer to the contributing guide for more information.

OpenAI
Anthropic
Mbodi (Coming Soon)
HuggingFace (Coming Soon)
Gemini (Coming Soon)

In Beta

For access (or just to say hey 😊), don't hesitate to fill out this form or reach out to us at info@mbodi.ai.

Conductor: A service for processing and managing datasets, and automatically training your models on your own data.
Conductor Dashboard: See how GPT-4o, Claude Opus, or your custom models are performing on your datasets and open benchmarks.
Data Augmentation: Build invariance to different environments by augmenting your dataset with Mbodi's diffusion-based data augmentation to achieve better generalization.
Mbodied SVLM: A new Spatial Vision Language Model trained specifically for spatial reasoning and robotics control.

Idea

The core idea behind Mbodied Agents is end-to-end continual learning. We believe that the best way to train a robot is to have it learn from its own experiences.

Installation

pip install mbodied-agents

Dev Environment Setup

Clone this repo:

git clone https://github.com/MbodiAI/mbodied-agents.git

Install system dependencies:
```
source install.bash
```
Then for each new terminal, run:
```
hatch shell
```

Getting Started

Please refer to examples/simple_robot_agent.py or use the Colab for a minimal example.

To run simple_robot_agent.py, if you want to use OpenAI, for example, as your backend:

export OPENAI_API_KEY=your_api_key
python examples/simple_robot_agent.py --backend=openai

Glossary

Agent: A unit of intelligent computation that takes in an Observation and outputs an Action. This can involve multiple sub-agents.
Backend: The system that embodied agents query. This typically involves a vision-language model or other specially purposed models.
Control: An atomic action that is “handed off” to other processes outside the scope of consideration. An example is HandControl, which includes x, y, z, roll, pitch, yaw, and grasp. This is a motion control used to manage the position, orientation, and hand-openness of an end-effector. Typically, this is passed to lower-level hardware interfaces or libraries.

Building Blocks

The Sample class

The Sample class is a base model for serializing, recording, and manipulating arbitrary data. It is designed to be extensible, flexible, and strongly typed. By wrapping your observation or action objects in the Sample class, you'll be able to convert to and from the following with ease:

a gym space for creating a new gym environment
a flattened list, array, or tensor for plugging into an ML model
a HuggingFace dataset with semantic search capabilities
a pydantic BaseModel for reliable and quick json serialization/deserialization.

Example Usage:

>>> from mbodied_agents.base.sample import Sample
>>> from pprint import pprint
>>> s = Sample(observation=[1,2,3], action=[4,5,6])
>>> s
Sample(observation=[1, 2, 3], action=[4, 5, 6])
>>> s.to('dict')
{'observation': [1, 2, 3], 'action': [4, 5, 6]}
>>> s.to('hf')
Dataset({
    features: ['observation', 'action'],
    num_rows: 3
})
>>> pprint(s.schema())
{'description': 'A base model class for serializing, recording, and '
                'manipulating arbitray data.',
 'properties': {'action': {'items': {'type': 'integer'}, 'type': 'array'},
                'observation': {'items': {'type': 'integer'}, 'type': 'array'}},
 'title': 'Sample',
 'type': 'object'}
>>> s.flatten('np') # Can also flatten to a dict, torch tensor, and list.
array([1, 2, 3, 4, 5, 6])

Message

The Message class represents a single completion sample space. It can be text, image, a list of text/images, Sample, or other modality. The Message class is designed to handle various types of content and supports different roles such as user, assistant, or system.

You can create a Message in versatile ways. They can all be understood by mbodi's backend.

Message(role="user", content="example text")
Message(role="user", content=["example text", Image("example.jpg"), Image("example2.jpg")])
Message(role="user", content=[Sample("Hello")])

Backend

The Backend class is an abstract base class for Backend implementations. It provides the basic structure and methods required for interacting with different backend services, such as API calls for generating completions based on given messages. See backend directory on how various backends are implemented.

Cognitive Agent

The Cognitive Agent is the main entry point for intelligent robot agents. It can connect to different backends or transformers of your choice. It includes methods for recording conversations, managing context, looking up messages, forgetting messages, storing context, and acting based on an instruction and an image.

Currently supported API services are OpenAI and Anthropic. Upcoming API services include Mbodi, Ollama, and HuggingFace. Stay tuned for our Mbodi backend service!

For example, to use OpenAI for your robot backend:

robot_agent = CognitiveAgent(context=context_prompt, api_service="openai")

context can be either a string or a list, for example:

context_prompt = "you are a robot"
# OR
context_prompt = [
    Message(role="system", content="you are a robot"),
    Message(role="user", content=["example text", Image("example.jpg")]),
    Message(role="assistant", content="Understood."),
]

To execute an instruction:

response = robot_agent.act(instruction, image)[0]
# You can also pass an arbituary number of text and image to the agent:
response = robot_agent.act([instruction1, image1, instruction2, image2])[0]

Controls

The controls module defines various motions to control a robot as Pydantic models. These controls cover a range of actions, from simple joint movements to complex poses and full robot control.

Hardware Interface

Mapping robot actions from any model to any embodiment is very easy. In our example script, we use a mock hardware interface. We also have an XArm interface as an example.

Upcoming: a remote hardware interface with a communication protocol. This will be very convenient for controlling robots that have a computer attached, e.g., LoCoBot.

Recorder

Dataset Recorder can record your conversation and the robot's actions to a dataset as you interact with/teach the robot. You can define any observation space and action space for the Recorder.

Here's an example of recording observation, instruction, and the output HandControl (x, y, z, r, p, y, grasp).

observation_space = spaces.Dict({
    'image': Image(size=(224, 224)).space(),
    'instruction': spaces.Text(1000)
})
action_space = HandControl().space()
recorder = Recorder('example_recorder', out_dir='saved_datasets', observation_space=observation_space, action_space=action_space)

# Every time robot makes a conversation or performs an action:
recorder.record(observation={'image': image, 'instruction': instruction,}, action=hand_control)

The dataset is saved to ./saved_datasets. Learn more about augmenting, and finetuning with this dataset by filling out this form.

Directory Structure

├─ assets/ ............. Images, icons, and other static assets
├─ examples/ ........... Example scripts and usage demonstrations
├─ resources/ .......... Additional resources for examples
├─ src/
│  └─ mbodied/
│     ├─ agents/ ....... Modules for robot agents
│     │  ├─ backends/ .. Backend implementations for different services for agents
│     │  ├─ language/ .. Language based agents modules
│     │  └─ sense/ ..... Sensory, e.g. audio, processing modules
│     ├─ base/ ......... Base classes and core infra modules
│     ├─ data/ ......... Data handling and processing
│     ├─ hardware/ ..... Hardware interface and interaction
│     └─ types/ ........ Common types and definitions
└─ tests/ .............. Unit tests

Contributing

We believe in the power of collaboration and open-source development. This platform is designed to be shared, extended, and improved by the community. See the contributing guide for more information.

Feel free to report any issues, ask questions, ask for features, or submit PRs.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.7

Jun 18, 2024

0.0.6

Jun 18, 2024

0.0.5

Jun 2, 2024

This version

0.0.4

Jun 2, 2024

0.0.3

May 31, 2024

0.0.2

May 30, 2024

0.0.1

Apr 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbodied_agents-0.0.4.tar.gz (26.3 MB view hashes)

Uploaded Jun 2, 2024 Source

Built Distribution

mbodied_agents-0.0.4-py3-none-any.whl (52.3 kB view hashes)

Uploaded Jun 2, 2024 Python 3

Hashes for mbodied_agents-0.0.4.tar.gz

Hashes for mbodied_agents-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`59ce4e884a8a4ff28f715884daa45d6f784af723c1b896e1760ceefa9d6e7729`
MD5	`181d818b68341de2dddf69db671ba236`
BLAKE2b-256	`b4a1a25b0179074d377b9034b5cc8fdadef3f01d78917e5ecdc86e5d2b620700`

Hashes for mbodied_agents-0.0.4-py3-none-any.whl

Hashes for mbodied_agents-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7f18ae5f3d4e92bfa46c30d8e6ff4fd33a7ac1202fabf0389a752dfb63a130e8`
MD5	`09ed4d7ab9d136d0eeb03055495cb8cc`
BLAKE2b-256	`b6923c3d4fcc2f4a69ab63d4667b2ed235b0c9c302758c83dbf9384fe92ec6fd`