Platform for creating computer-use verifiable environments and training VLM agents to use them.

Project description

cua-bench

A set of tools for creating verifiable environments for computer automation tasks, evaluation, and training. Features support for both real Windows, Linux, macOS, and Android VM environments, as well as HTML-based webtop environments that can visually emulate macos, win11, win10, ios, android, and more.

Installation

uv tool install -e .
playwright install chromium

Docker Setup (for batch jobs and dataset processing)

Build the cua-bench Docker image:

docker build -t cua-bench:latest .

Quick Start

Create an environment

cb create-task tasks/my_env

Run the environment:

cb interact tasks/my_env

CLI Usage

Install an environment

cb install tasks/click_env

List tasks

# List all environments
cb tasks

# List tasks in specific environment
cb tasks tasks/click_env

Interact with a task

Interact with a task in the browser. This is useful for debugging and testing.

cb interact tasks/click_env --task-id 0 --solve --screenshot output.png

Evaluate agents on tasks

# Evaluate agent on tasks/click_env
cb eval tasks/click_env --model anthropic/claude-3-5-sonnet-20240620

Run tasks with batch processing

Run a cluster of cua-bench tasks on GCP or locally. For multi-step trajectories, use cb dump-solution. For single-step trajectories, use cb dump-setup.

# Build Docker image first (required for local batch)
docker build -t cua-bench:latest .

# Local (Docker) - Run 4 tasks from click_env (setup + solve + evaluate)
cb dump-solution tasks/click_env 4 --local

# Local (Docker) - Run 4 tasks from click_env (setup + evaluate)
cb dump-setup tasks/click_env 4 --local --output-dir ./outputs

# GCP Batch - Run 16 tasks from click_env (setup + solve + evaluate)
cb dump-solution tasks/click_env 16 --parallelism 8

# GCP Batch - Run 16 tasks from click_env (setup + evaluate)
cb dump-setup tasks/click_env 16 --parallelism 8 --output-dir ./outputs

Process snapshots into a training dataset for UI grounding

Given a directory of snapshots, cua-bench offers a simple way to process them into a dataset for UI grounding using action augmentation.

# Process 5 snapshots using 'aguvis' action augmentation
cb process ./outputs 5

# Process all snapshots and push to Hugging Face Hub
cb process ./outputs --push-to-hub --repo-id username/repo

Programmatic Interface

import cua_bench as cb

# Create an environment
env = cb.make("tasks/click_env")

# Setup and get initial screenshot
screenshot, task_cfg = env.reset()  # optionally pass task_id

# Execute a step
screenshot = env.step('page.click("#submit")')

# Run the solution
screenshot = env.solve()

# Evaluate the result
rewards = env.evaluate()

# Clean up
env.close()

Project details

Release history Release notifications | RSS feed

0.2.10

Apr 15, 2026

0.2.8

Mar 27, 2026

0.2.7

Mar 23, 2026

0.2.6

Mar 4, 2026

0.2.5

Mar 4, 2026

0.2.4

Feb 10, 2026

0.2.3

Jan 12, 2026

This version

0.2.0

Oct 27, 2025

0.1.0

Oct 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cua_bench-0.2.0.tar.gz (88.1 MB view details)

Uploaded Oct 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cua_bench-0.2.0-py3-none-any.whl (87.7 MB view details)

Uploaded Oct 27, 2025 Python 3

File details

Details for the file cua_bench-0.2.0.tar.gz.

File metadata

Download URL: cua_bench-0.2.0.tar.gz
Upload date: Oct 27, 2025
Size: 88.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for cua_bench-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4fb9745b210bef148163c8cde5cbad01c41159015644a7fc0802ec47fc331be8`
MD5	`cafb482cbbdb0e88ffa45380978dad6b`
BLAKE2b-256	`14caf89fa41f4a251e51fcd0726cbf8f54d1e6aedef97e05fec460a26adc44db`

See more details on using hashes here.

File details

Details for the file cua_bench-0.2.0-py3-none-any.whl.

File metadata

Download URL: cua_bench-0.2.0-py3-none-any.whl
Upload date: Oct 27, 2025
Size: 87.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for cua_bench-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5033bc86054baae43822a787b42c8c9cf419daa1e4fa9c3fe6e19cf9eca2f533`
MD5	`8d56ad54ed29fefbd799172a9ad2a8a5`
BLAKE2b-256	`36500035b559c08e8bdc09f3f6a1f5a153638e694ede727af0a406f4eeedffe6`

See more details on using hashes here.

cua-bench 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

cua-bench

Installation

Docker Setup (for batch jobs and dataset processing)

Quick Start

Create an environment

CLI Usage

Install an environment

List tasks

Interact with a task

Evaluate agents on tasks

Run tasks with batch processing

Process snapshots into a training dataset for UI grounding

Programmatic Interface

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes