No project description provided

Project description

Embodied Agent Interface (EAgent): Benchmarking LLMs for Embodied Decision Making

Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu

Stanford Vision and Learning Lab, Stanford University

Dataset Highlights

Standardized goal specifications.
Standardized modules and interfaces.
Broad coverage of evaluation and fine-grained metrics.

Overview

We aim to evaluate Large Language Models (LLMs) for embodied decision-making. While many works leverage LLMs for decision-making in embodied environments, a systematic understanding of their performance is still lacking. These models are applied in different domains, for various purposes, and with diverse inputs and outputs. Current evaluations tend to rely on final success rates alone, making it difficult to pinpoint where LLMs fall short and how to leverage them effectively in embodied AI systems.

To address this gap, we propose the Embodied Agent Interface (EAgent), which unifies:

A broad set of embodied decision-making tasks involving both state and temporally extended goals.
Four commonly used LLM-based modules: goal interpretation, subgoal decomposition, action sequencing, and transition modeling.
Fine-grained evaluation metrics, identifying errors such as hallucinations, affordance issues, and planning mistakes.

Our benchmark provides a comprehensive assessment of LLM performance across different subtasks, identifying their strengths and weaknesses in embodied decision-making contexts.

Installation

Create and Activate a Conda Environment:

conda create -n eagent python=3.8 -y 
conda activate eagent

Install eagent-eval:

You can install it from pip:

pip install eagent-eval

Or, install from source:

git clone https://github.com/embodied-agent-interface/embodied-agent-interface.git
cd embodied-agent-interface
pip install -e .

(Optional) Install iGibson for behavior evaluation:

If you need to use behavior_eval, install iGibson. Follow these steps to minimize installation issues:
- Make sure you are using Python 3.8 and meet the minimum system requirements in the iGibson installation guide.
- Install CMake using Conda (do not use pip):
```
conda install cmake
```
- Install iGibson: We provide an installation script:
```
python -m behavior_eval.utils.install_igibson_utils
```
  Alternatively, install it manually:
```
git clone https://github.com/embodied-agent-interface/iGibson.git --recursive
cd iGibson
pip install -e .
```
- Download assets:
```
python -m behavior_eval.utils.download_utils
```
We have successfully tested installation on Linux, Windows 10+, and macOS.

Quick Start

Arguments:

eagent-eval \
  --dataset {virtualhome,behavior} \
  --mode {generate_prompts,evaluate_results} \
  --eval-type {action_sequencing,transition_modeling,goal_interpretation,subgoal_decomposition} \
  --llm-response-path <path_to_responses> \
  --output-dir <output_directory> \
  --num-workers <number_of_workers>

Run the following command for further information:

eagent-eval --help

Examples:

Evaluate Results

Make sure to download our results first if you don't want to specify <path_to_responses>

python -m eagent_eval.utils.download_utils

Then, run the commands below:

eagent-eval --dataset virtualhome --eval-type action_sequencing --mode evaluate_results
eagent-eval --dataset virtualhome --eval-type transition_modeling --mode evaluate_results
eagent-eval --dataset virtualhome --eval-type goal_interpretation --mode evaluate_results
eagent-eval --dataset virtualhome --eval-type subgoal_decomposition --mode evaluate_results
eagent-eval --dataset behavior --eval-type action_sequencing --mode evaluate_results
eagent-eval --dataset behavior --eval-type transition_modeling --mode evaluate_results
eagent-eval --dataset behavior --eval-type goal_interpretation --mode evaluate_results
eagent-eval --dataset behavior --eval-type subgoal_decomposition --mode evaluate_results

Generate Pormpts

To generate prompts, you can run:

eagent-eval --dataset virtualhome --eval-type action_sequencing --mode generate_prompts
eagent-eval --dataset virtualhome --eval-type transition_modeling --mode generate_prompts
eagent-eval --dataset virtualhome --eval-type goal_interpretation --mode generate_prompts
eagent-eval --dataset virtualhome --eval-type subgoal_decomposition --mode generate_prompts
eagent-eval --dataset behavior --eval-type action_sequencing --mode generate_prompts
eagent-eval --dataset behavior --eval-type transition_modeling --mode generate_prompts
eagent-eval --dataset behavior --eval-type goal_interpretation --mode generate_prompts
eagent-eval --dataset behavior --eval-type subgoal_decomposition --mode generate_prompts

Docker

We provide a ready-to-use Docker image for easy installation and usage.

First, pull the Docker image from Docker Hub:

docker pull jameskrw/eagent-eval

Next, run the Docker container interactively:

docker run -it jameskrw/eagent-eval

When inside the container, make sure you remain in the /opt/iGibson directory (do not change to other directories).

To check the available arguments for the eagent-eval CLI, use the following command:

python3 -m eagent_eval.cli --help

You can run:

python3 -m eagent_eval.cli

By default, this will start generating prompts for goal interpretation in Behavior.

The command python3 -m eagent_eval.cli is equivalent to eagent-eval as introduced above, although currently only python3 -m eagent_eval.cli is supported in the docker.

Project details

Release history Release notifications | RSS feed

This version

0.0.8

Oct 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eagent_eval-0.0.8.tar.gz (22.3 MB view details)

Uploaded Oct 1, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

eagent_eval-0.0.8-py3-none-any.whl (26.3 MB view details)

Uploaded Oct 1, 2024 Python 3

File details

Details for the file eagent_eval-0.0.8.tar.gz.

File metadata

Download URL: eagent_eval-0.0.8.tar.gz
Upload date: Oct 1, 2024
Size: 22.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for eagent_eval-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`3810bf22af84d84810b0890e3ec762b54a912037fe0b9395d841e61e26c9a26f`
MD5	`b63ce5da471b3f08f2712a4034e7fca9`
BLAKE2b-256	`b492d91926a48074ac8e7d9eebc234d973f2a51b1aeb2c19eeecdae11040ae4e`

See more details on using hashes here.

File details

Details for the file eagent_eval-0.0.8-py3-none-any.whl.

File metadata

Download URL: eagent_eval-0.0.8-py3-none-any.whl
Upload date: Oct 1, 2024
Size: 26.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for eagent_eval-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f3144ee459ec92f18f1adac2d2ddd23db33fed89a3b8dd16313ac9e881383a5`
MD5	`eb4fba9b283ccfd3c673848673e1b549`
BLAKE2b-256	`cfdbf909ad55150ea09da865ca2f89c83cd9765de755e5f5f6d959b0427997d7`

See more details on using hashes here.

eagent-eval 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Embodied Agent Interface (EAgent): Benchmarking LLMs for Embodied Decision Making

Dataset Highlights

Overview

Installation

Quick Start

Docker

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes