Embodied agent interface evaluation for VirtualHome
Project description
Installation and Usage Guide for virtualhome-eval
Install dependencies
pip install virtualhome_eval
Usage
To run virtualhome_eval
- Use in python
from virtualhome_eval.agent_eval import agent_evaluation
agent_evaluation(mode=[generate_prompts, evaluate_results], eval_type=[goal_interpretation, action_sequence, transition_modeling], llm_response_path=[YOUR LLM OUTPUT DIR])
- Use directly in the command line
virtualhome-eval --mode [generate_prompts, evaluate_results] --eval-type [goal_interpretation, action_sequence] --llm-response-path [YOUR LLM OUTPUT DIR] --output-dir [YOUR EVAL OUTPUT DIR]
Parameters
mode: Specifies either generate prompts or evaluate results. Options are:generate_promptsevaluate_results
eval_type: Specifies the evaluation task type. Options are:goal_interpretationaction_sequencesubgoal_decompositiontransition_model
llm_response_path: The path of LLM output directory to be evaluated. It is""by default, using the existing outputs at directoryvirtualhome_eval/llm_response/. The function will evaluate all LLM outputs under the directory.dataset: The dataset type. Options:virtualhomebehavior
output_dir: The directory to store the output results. By default, it is atoutput/of current path.
Example usage in python
- To generate prompts for
goal_interpretation:
agent_evaluation(mode='generate_prompts', eval_type='goal_interpretation')
- To evaluate LLM outputs for
goal_interpretation:
results = agent_evaluation(mode='evaluate_results', eval_type='goal_interpretation')
- To generate prompts for
action_sequence:
agent_evaluation(mode='generate_prompts', eval_type='action_sequence')
- To evaluate LLM outputs for
action_sequence:
results = agent_evaluation(mode='evaluate_results', eval_type='action_sequence')
- To generate Virtualhome prompts for
transition_model:
agent_evaluation(mode='generate_prompts', eval_type='transition_model')
- To evaluate LLM outputs on Virtualhome for
transition_model:
results = agent_evaluation(mode='evaluate_results', eval_type='transition_model')
- To generate prompts for
subgoal_decomposition:
agent_evaluation(mode='generate_prompts', eval_type='subgoal_decomposition')
- To evaluate LLM outputs for
subgoal_decomposition:
results = agent_evaluation(mode='evaluate_results', eval_type='subgoal_decomposition')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
virtualhome_eval-0.1.1.tar.gz
(22.9 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file virtualhome_eval-0.1.1.tar.gz.
File metadata
- Download URL: virtualhome_eval-0.1.1.tar.gz
- Upload date:
- Size: 22.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a790c85f241a67e718d2f3d05d3daaea38acb7db5e16de118ced089dbf9eddd7
|
|
| MD5 |
0c74005981dee466b7075960ea9deb2f
|
|
| BLAKE2b-256 |
b7046e5a7062faa2e038667a7c5bf8b2f57c9810482ab42f235b98141b9d4a10
|
File details
Details for the file virtualhome_eval-0.1.1-py3-none-any.whl.
File metadata
- Download URL: virtualhome_eval-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4e244f423edfc0da2ec7b7daf48f2fca8af00afe6f2af07b8f1e53d8c79de37
|
|
| MD5 |
2b7a9dac3ca4f4da859c01e59ac0c774
|
|
| BLAKE2b-256 |
b96f63a7e29f0c00f10c2a8fd80d7697f77bf40c9a4ec0c813cdbf198f70551e
|