Platform for creating computer-use verifiable environments and training VLM agents to use them.
Project description
cua-bench
A framework for computer automation machine learning. Features a HTML-based desktop environment with a semantic design system that can visually emulate macos, win11, win10, ios, android, and more.
Installation
uv pip install -e .
playwright install chromium
Docker Setup (for batch processing)
Build the cua-bench Docker image:
docker build -t cua-bench:latest .
Quick Start
Create an environment
td create-task tasks/my_env
Run the environment:
td interact tasks/my_env
CLI Usage
Install an environment
td install tasks/click_env
List tasks
# List all environments
td tasks
# List tasks in specific environment
td tasks tasks/click_env
Interact with a task
Interact with a task in the browser. This is useful for debugging and testing.
td interact tasks/click_env --task-id 0 --solve --screenshot output.png
Run tasks with batch processing
Run a cluster of cua-bench tasks on GCP or locally. For multi-step trajectories, use td dump-solution. For single-step trajectories, use td dump-setup.
# Build Docker image first (required for local batch)
docker build -t cua-bench:latest .
# Local (Docker) - Run 4 tasks from click_env (setup + solve + evaluate)
td dump-solution tasks/click_env 4 --local
# Local (Docker) - Run 4 tasks from click_env (setup + evaluate)
td dump-setup tasks/click_env 4 --local --output-dir ./outputs
# GCP Batch - Run 16 tasks from click_env (setup + solve + evaluate)
td dump-solution tasks/click_env 16 --parallelism 8
# GCP Batch - Run 16 tasks from click_env (setup + evaluate)
td dump-setup tasks/click_env 16 --parallelism 8 --output-dir ./outputs
Process snapshots into a training dataset for UI grounding
Given a directory of snapshots, cua-bench offers a simple way to process them into a dataset for UI grounding using action augmentation.
# Process 5 snapshots using 'aguvis' action augmentation
td process ./outputs 5
# Process all snapshots and push to Hugging Face Hub
td process ./outputs --push-to-hub --repo-id username/repo
Programmatic Interface
import cua_bench as cb
# Create an environment
env = cb.make("tasks/click_env")
# Setup and get initial screenshot
screenshot, task_cfg = env.setup() # optionally pass task_id
# Execute a step
screenshot = env.step('page.click("#submit")')
# Run the solution
screenshot = env.solve()
# Evaluate the result
rewards = env.evaluate()
# Clean up
env.close()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cua_bench-0.1.0.tar.gz.
File metadata
- Download URL: cua_bench-0.1.0.tar.gz
- Upload date:
- Size: 87.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b7b36511f4ea55c996a909821017e7b0fbfecf1276b16a1d69829ad47ae25b5
|
|
| MD5 |
0c2e621527261d9406bd32046eb17c2b
|
|
| BLAKE2b-256 |
a13360eafc9e492c3b47006d4e4e88be40249ff867b0442332235d9a1e041471
|
File details
Details for the file cua_bench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cua_bench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
549cb606a5ebe6796e278abacb038cd6ca44a0dbdfa94f95aca95dea7644e84c
|
|
| MD5 |
41f982006bdf9504eeb97b66bc5ea691
|
|
| BLAKE2b-256 |
3ec3589c90a3746e07ab221b73cfee313cfa120c036cc8b8e3c5c434bafd95a5
|