Skip to main content

Embodied agent interface evaluation for VirtualHome

Project description

Installation and Usage Guide for virtualhome-eval

Install dependencies

pip install virtualhome_eval

Usage

To run virtualhome_eval

  1. Use in python
from virtualhome_eval.agent_eval import agent_evaluation
agent_evaluation(mode=[generate_prompts, evaluate_results], eval_type=[goal_interpretation, action_sequence, transition_modeling], llm_response_path=[YOUR LLM OUTPUT DIR])
  1. Use directly in the command line
virtualhome-eval --mode [generate_prompts, evaluate_results] --eval-type [goal_interpretation, action_sequence] --llm-response-path [YOUR LLM OUTPUT DIR] --output-dir [YOUR EVAL OUTPUT DIR]

Parameters

  • mode: Specifies either generate prompts or evaluate results. Options are:
    • generate_prompts
    • evaluate_results
  • eval_type: Specifies the evaluation task type. Options are:
    • goal_interpretation
    • action_sequence
    • subgoal_decomposition
    • transition_model
  • llm_response_path: The path of LLM output directory to be evaluated. It is "" by default, using the existing outputs at directory virtualhome_eval/llm_response/. The function will evaluate all LLM outputs under the directory.
  • dataset: The dataset type. Options:
    • virtualhome
    • behavior
  • output_dir: The directory to store the output results. By default, it is at output/ of current path.

Example usage in python

  1. To generate prompts for goal_interpretation:
agent_evaluation(mode='generate_prompts',  eval_type='goal_interpretation')
  1. To evaluate LLM outputs for goal_interpretation:
results = agent_evaluation(mode='evaluate_results', eval_type='goal_interpretation')
  1. To generate prompts for action_sequence:
agent_evaluation(mode='generate_prompts',  eval_type='action_sequence')
  1. To evaluate LLM outputs for action_sequence:
results = agent_evaluation(mode='evaluate_results', eval_type='action_sequence')
  1. To generate Virtualhome prompts for transition_model:
agent_evaluation(mode='generate_prompts',  eval_type='transition_model')
  1. To evaluate LLM outputs on Virtualhome for transition_model:
results = agent_evaluation(mode='evaluate_results', eval_type='transition_model')
  1. To generate prompts for subgoal_decomposition:
agent_evaluation(mode='generate_prompts',  eval_type='subgoal_decomposition')
  1. To evaluate LLM outputs for subgoal_decomposition:
results = agent_evaluation(mode='evaluate_results', eval_type='subgoal_decomposition')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtualhome_eval-0.1.1.tar.gz (22.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virtualhome_eval-0.1.1-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file virtualhome_eval-0.1.1.tar.gz.

File metadata

  • Download URL: virtualhome_eval-0.1.1.tar.gz
  • Upload date:
  • Size: 22.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for virtualhome_eval-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a790c85f241a67e718d2f3d05d3daaea38acb7db5e16de118ced089dbf9eddd7
MD5 0c74005981dee466b7075960ea9deb2f
BLAKE2b-256 b7046e5a7062faa2e038667a7c5bf8b2f57c9810482ab42f235b98141b9d4a10

See more details on using hashes here.

File details

Details for the file virtualhome_eval-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for virtualhome_eval-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a4e244f423edfc0da2ec7b7daf48f2fca8af00afe6f2af07b8f1e53d8c79de37
MD5 2b7a9dac3ca4f4da859c01e59ac0c774
BLAKE2b-256 b96f63a7e29f0c00f10c2a8fd80d7697f77bf40c9a4ec0c813cdbf198f70551e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page