Skip to main content

Embodied agent interface evaluation for VirtualHome

Project description

Installation and Usage Guide for virtualhome-eval

Install dependencies

pip install virtualhome_eval

Usage

To run virtualhome_eval, use the following commands with arguments

from virtualhome_eval.agent_eval import agent_evaluation
agent_evaluation(mode=[generate_prompts, evaluate_results], eval_type=[goal_interpretation, action_sequence, transition_modeling], llm_response_path=[YOUR LLM OUTPUT DIR])

Parameters

  • mode: Specifies either generate prompts or evaluate results. Options are:
    • generate_prompts
    • evaluate_results
  • eval_type: Specifies the evaluation task type. Options are:
    • goal_interpretation
    • action_sequence
    • subgoal_decomposition
    • transition_model
  • llm_response_path: The path of LLM output directory to be evaluated. It is "" by default, using the existing outputs at directory virtualhome_eval/llm_response/. The function will evaluate all LLM outputs under the directory.
  • dataset: The dataset type. Options:
    • virtualhome
    • behavior
  • output_dir: The directory to store the output results. By default, it is at output/ of current path.

Example usage

  1. To generate prompts for goal_interpretation:
agent_evaluation(mode='generate_prompts',  eval_type='goal_interpretation')
  1. To evaluate LLM outputs for goal_interpretation:
results = agent_evaluation(mode='evaluate_results', eval_type='goal_interpretation')
  1. To generate prompts for action_sequence:
agent_evaluation(mode='generate_prompts',  eval_type='action_sequence')
  1. To evaluate LLM outputs for action_sequence:
results = agent_evaluation(mode='evaluate_results', eval_type='action_sequence')
  1. To generate Virtualhome prompts for transition_model:
agent_evaluation(mode='generate_prompts',  eval_type='transition_model')
  1. To evaluate LLM outputs on Virtualhome for transition_model:
results = agent_evaluation(mode='evaluate_results', eval_type='transition_model')
  1. To generate prompts for subgoal_decomposition:
agent_evaluation(mode='generate_prompts',  eval_type='subgoal_decomposition')
  1. To evaluate LLM outputs for subgoal_decomposition:
results = agent_evaluation(mode='evaluate_results', eval_type='subgoal_decomposition')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtualhome_eval-0.1.0.tar.gz (22.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtualhome_eval-0.1.0-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file virtualhome_eval-0.1.0.tar.gz.

File metadata

  • Download URL: virtualhome_eval-0.1.0.tar.gz
  • Upload date:
  • Size: 22.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for virtualhome_eval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 028dfd7c187fc4e8a6ffdad0184674d08a26ad1825ac52423e24f61083f7c5c1
MD5 f174bdd657273ae698a06566baef0ee1
BLAKE2b-256 ac23e2fc9d98a2884e07c9e3e76d72c41b68bd6bc443318b4fd66945e30f587c

See more details on using hashes here.

File details

Details for the file virtualhome_eval-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for virtualhome_eval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37683de789b1bbec62d62308c4fb0e21d3d8c7ee3204c7bfc404a1fc8be49131
MD5 d39991f6d46284f40937fd9e437afa54
BLAKE2b-256 fd65068f3e23002be0644d0a31b0dba1c1d644f9d17c36272fcfd297aefc03df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page