desktop-env

The package provides a desktop environment for setting and evaluating desktop automation tasks.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Banner

Website • Paper • Data • Data Viewer • Discord

📢 Updates

2024-04-11: We released our paper, environment and benchmark, and project page. Check it out!

💾 Installation

On Your Desktop or Server (Non-Virtualized Platform)

Suppose you are operating on a system that has not been virtualized, meaning you are not utilizing a virtualized environment like AWS, Azure, or k8s. If this is the case, proceed with the instructions below. However, if you are on a virtualized platform, please refer to the virtualized platform section.

First, clone this repository and cd into it. Then, install the dependencies listed in requirements.txt. It is recommended that you use the latest version of Conda to manage the environment, but you can also choose to manually install the dependencies. Please ensure that the version of Python is >= 3.9.

# Clone the OSWorld repository
git clone https://github.com/xlang-ai/OSWorld

# Change directory into the cloned repository
cd OSWorld

# Optional: Create a Conda environment for OSWorld
# conda create -n osworld python=3.9
# conda activate osworld

# Install required dependencies
pip install -r requirements.txt

Alternatively, you can install the environment without any benchmark tasks:

pip install desktop-env

Install VMware Workstation Pro (for systems with Apple Chips, you should install VMware Fusion) and configure the vmrun command. Verify the successful installation by running the following:

vmrun -T ws list

If the installation along with the environment variable set is successful, you will see the message showing the current running virtual machines.

All set! Our setup script will automatically download the necessary virtual machines and configure the environment for you.

On AWS or Azure (Virtualized platform)

We are working on supporting it 👷. Please hold tight!

🚀 Quick Start

Run the following minimal example to interact with the environment:

from desktop_env.envs.desktop_env import DesktopEnv

example = {
    "id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
    "instruction": "I want to install Spotify on my current system. Could you please help me?",
    "config": [
        {
            "type": "execute",
            "parameters": {
                "command": [
                    "python",
                    "-c",
                    "import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"
                ]
            }
        }
    ],
    "evaluator": {
        "func": "check_include_exclude",
        "result": {
            "type": "vm_command_line",
            "command": "which spotify"
        },
        "expected": {
            "type": "rule",
            "rules": {
                "include": ["spotify"],
                "exclude": ["not found"]
            }
        }
    }
}

env = DesktopEnv(action_space="pyautogui")

obs = env.reset(task_config=example)
obs, reward, done, info = env.step("pyautogui.rightClick()")

You will see all the logs of the system running normally, including the successful creation of the environment, completion of setup, and successful execution of actions. In the end, you will observe a successful right-click on the screen, which means you are ready to go.

🧪 Experiments

Agent Baselines

If you wish to run the baseline agent used in our paper, you can execute the following command as an example under the GPT-4V pure-screenshot setting:

python run.py --path_to_vm Ubuntu/Ubuntu.vmx --headless --observation_type screenshot --model gpt-4-vision-preview --result_dir ./results

The results, which include screenshots, actions, and video recordings of the agent's task completion, will be saved in the ./results directory in this case. You can then run the following command to obtain the result:

python show_result.py

Evaluation

Please start by reading through the agent interface and the environment interface. Correctly implement the agent interface and import your customized version in the run.py file. Afterward, you can execute a command similar to the one in the previous section to run the benchmark on your agent.

❓ FAQ

What are the running times and costs under different settings?

Setting	Expected Time*	Budget Cost (Full Test Set/Small Test Set)
GPT-4V (screenshot)	10h	$100 ($10)
Gemini-ProV (screenshot)	15h	$0 ($0)
Claude-3 Opus (screenshot)	15h	$150 ($15)
GPT-4V (a11y tree, SoM, etc.)	30h	$500 ($50)

*No environment parallelism. Calculated in April 2024.

📄 Citation

If you find this environment useful, please consider citing our work:

@misc{OSWorld,
      title={OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments}, 
      author={Tianbao Xie and Danyang Zhang and Jixuan Chen and Xiaochuan Li and Siheng Zhao and Ruisheng Cao and Toh Jing Hua and Zhoujun Cheng and Dongchan Shin and Fangyu Lei and Yitao Liu and Yiheng Xu and Shuyan Zhou and Silvio Savarese and Caiming Xiong and Victor Zhong and Tao Yu},
      year={2024},
      eprint={2404.07972},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.2

Oct 23, 2025

1.0.1

Oct 23, 2025

1.0.0

Oct 13, 2025

0.1.22

Oct 25, 2024

0.1.21

Oct 9, 2024

0.1.20

Oct 9, 2024

0.1.19

Aug 4, 2024

0.1.18

Aug 3, 2024

0.1.17

Jul 14, 2024

0.1.15

Jun 1, 2024

0.1.14

May 20, 2024

This version

0.1.13

May 19, 2024

0.1.12

May 10, 2024

0.1.11

May 10, 2024

0.1.9

May 10, 2024

0.1.8

May 10, 2024

0.1.7

Apr 25, 2024

0.1.6

Apr 22, 2024

0.1.5

Apr 13, 2024

0.1.4

Apr 13, 2024

0.1.3

Apr 7, 2024

0.1.2

Feb 24, 2024

0.1.1

Feb 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desktop_env-0.1.13.tar.gz (90.4 kB view details)

Uploaded May 19, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

desktop_env-0.1.13-py3-none-any.whl (99.1 kB view details)

Uploaded May 19, 2024 Python 3

File details

Details for the file desktop_env-0.1.13.tar.gz.

File metadata

Download URL: desktop_env-0.1.13.tar.gz
Upload date: May 19, 2024
Size: 90.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for desktop_env-0.1.13.tar.gz
Algorithm	Hash digest
SHA256	`70c33dc731191d55495adaa0e95cf82e86fb32e982937fb7a3d2a418e5a58b0e`
MD5	`70d47eca13453fcdd2c503df2ceb9e33`
BLAKE2b-256	`03219fac327872f6cf55ad0e337a10cd0c42f6147c7ee3af897dd3b05ed00362`

See more details on using hashes here.

File details

Details for the file desktop_env-0.1.13-py3-none-any.whl.

File metadata

Download URL: desktop_env-0.1.13-py3-none-any.whl
Upload date: May 19, 2024
Size: 99.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for desktop_env-0.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d91bc6f9739731a655c888719688e60d166d8a02cf99c87ca6fe338a519c0315`
MD5	`d907e6b34709fabbc2965115e0a98011`
BLAKE2b-256	`2cbade39582c78f2ac81b777731787c763de2a33382b5d08b2d1584676b7fb0c`

See more details on using hashes here.

desktop-env 0.1.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📢 Updates

💾 Installation

On Your Desktop or Server (Non-Virtualized Platform)

On AWS or Azure (Virtualized platform)

🚀 Quick Start

🧪 Experiments

Agent Baselines

Evaluation

❓ FAQ

What are the running times and costs under different settings?

📄 Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes