The package provides a desktop environment for setting and evaluating desktop automation tasks.
Project description
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Updates
- 2024-04-04: We released our paper, environment and benchmark, and project page. Check it out!
Installation
On Your Desktop or Server (Non-Virtualized Platform)
Suppose you are operating on a system that has not been virtualized, meaning you are not utilizing a virtualized environment like AWS, Azure, or k8s. If this is the case, proceed with the instructions below. However, if you are on a virtualized platform, please refer to the virtualized platform section.
- First, clone this repository and
cd
into it. Then, install the dependencies listed inrequirements.txt
. It is recommended that you use the latest version of Conda to manage the environment, but you can also choose to manually install the dependencies. Please ensure that the version of Python is >= 3.9.
# Clone the OSWorld repository
git clone https://github.com/xlang-ai/OSWorld
# Change directory into the cloned repository
cd OSWorld
# Optional: Create a Conda environment for OSWorld
# conda create -n osworld python=3.9
# conda activate osworld
# Install required dependencies
pip install -r requirements.txt
- Install VMware Workstation Pro (for systems with Apple Chips, you should install VMware Fusion) and configure the
vmrun
command. Verify the successful installation by running the following:
vmrun -T ws list
If the installation along with the environment variable set is successful, you will see the message showing the current running virtual machines.
- Obtain the virtual machine image. If you are using Linux or Windows with an x86_64 CPU, install the environment package and download the examples and the virtual machine image by executing the following commands:
Remove the
nogui
parameter if you wish to view the activities within the virtual machine.
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"
For macOS with Apple chips, you should install the specially prepared virtual machine image by executing the following commands:
gdown https://drive.google.com/drive/folders/1wT0vwpuEFTIPik9Tjn4DWoZ2oHCD7tM0 -O Ubuntu --folder
vmrun -T fusion start "Ubuntu/Ubuntu.vmx"
vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state"
On AWS or Azure (Virtualized platform)
We are working on supporting it 👷. Please hold tight!
Quick Start
Run the following minimal example to interact with the environment:
from desktop_env.envs.desktop_env import DesktopEnv
example = {
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
"instruction": "I want to install Spotify on my current system. Could you please help me?",
"config": [
{
"type": "execute",
"parameters": {
"command": [
"python",
"-c",
"import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"
]
}
}
],
"evaluator": {
"func": "check_include_exclude",
"result": {
"type": "vm_command_line",
"command": "which spotify"
},
"expected": {
"type": "rule",
"rules": {
"include": ["spotify"],
"exclude": ["not found"]
}
}
}
}
env = DesktopEnv(
path_to_vm=r"Ubuntu/DesktopEnv-Ubuntu 64-bit Arm.vmx",
action_space="pyautogui"
)
obs = env.reset(task_config=example)
obs, reward, done, info = env.step("pyautogui.rightClick()")
You will see all the logs of the system running normally, including the successful creation of the environment, completion of setup, and successful execution of actions. In the end, you will observe a successful right-click on the screen, which means you are ready to go.
Experiments
Agent Baselines
If you wish to run the baseline agent used in our paper, you can execute the following command as an example under the GPT-4V pure-screenshot setting:
python run.py --path_to_vm Ubuntu/Ubuntu.vmx --headless --observation_type screenshot --model gpt-4-vision-preview --result_dir ./results
The results, which include screenshots, actions, and video recordings of the agent's task completion, will be saved in the ./results
directory in this case. You can then run the following command to obtain the result:
python show_result.py
Evaluation
Please start by reading through the agent interface and the environment interface.
Correctly implement the agent interface and import your customized version in the run.py
file.
Afterward, you can execute a command similar to the one in the previous section to run the benchmark on your agent.
Citation
If you find this environment useful, please consider citing our work:
@article{OSWorld,
title={},
author={},
journal={arXiv preprint arXiv:xxxx.xxxx},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for desktop_env-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db034c4a56f2ab79f7d76c8935d9e859b8ae556e4d2ec13a1d5b6b91a41ce1e4 |
|
MD5 | 7b72951fd5e35049fb12b3fdc454dc71 |
|
BLAKE2b-256 | 6070788d1be2a786c6679ba060397faeab8e5ce663dfa5ef09211117e8f04b15 |