Platform for creating computer-use verifiable environments and training VLM agents to use them.

Project description

cua-bench

A framework for computer automation machine learning. Features a HTML-based desktop environment with a semantic design system that can visually emulate macos, win11, win10, ios, android, and more.

Installation

uv pip install -e .
playwright install chromium

Docker Setup (for batch processing)

Build the cua-bench Docker image:

docker build -t cua-bench:latest .

Quick Start

Create an environment

td create-task tasks/my_env

Run the environment:

td interact tasks/my_env

CLI Usage

Install an environment

td install tasks/click_env

List tasks

# List all environments
td tasks

# List tasks in specific environment
td tasks tasks/click_env

Interact with a task

Interact with a task in the browser. This is useful for debugging and testing.

td interact tasks/click_env --task-id 0 --solve --screenshot output.png

Run tasks with batch processing

Run a cluster of cua-bench tasks on GCP or locally. For multi-step trajectories, use td dump-solution. For single-step trajectories, use td dump-setup.

# Build Docker image first (required for local batch)
docker build -t cua-bench:latest .

# Local (Docker) - Run 4 tasks from click_env (setup + solve + evaluate)
td dump-solution tasks/click_env 4 --local

# Local (Docker) - Run 4 tasks from click_env (setup + evaluate)
td dump-setup tasks/click_env 4 --local --output-dir ./outputs

# GCP Batch - Run 16 tasks from click_env (setup + solve + evaluate)
td dump-solution tasks/click_env 16 --parallelism 8

# GCP Batch - Run 16 tasks from click_env (setup + evaluate)
td dump-setup tasks/click_env 16 --parallelism 8 --output-dir ./outputs

Process snapshots into a training dataset for UI grounding

Given a directory of snapshots, cua-bench offers a simple way to process them into a dataset for UI grounding using action augmentation.

# Process 5 snapshots using 'aguvis' action augmentation
td process ./outputs 5

# Process all snapshots and push to Hugging Face Hub
td process ./outputs --push-to-hub --repo-id username/repo

Programmatic Interface

import cua_bench as cb

# Create an environment
env = cb.make("tasks/click_env")

# Setup and get initial screenshot
screenshot, task_cfg = env.setup()  # optionally pass task_id

# Execute a step
screenshot = env.step('page.click("#submit")')

# Run the solution
screenshot = env.solve()

# Evaluate the result
rewards = env.evaluate()

# Clean up
env.close()

Project details

Release history Release notifications | RSS feed

0.2.10

Apr 15, 2026

0.2.8

Mar 27, 2026

0.2.7

Mar 23, 2026

0.2.6

Mar 4, 2026

0.2.5

Mar 4, 2026

0.2.4

Feb 10, 2026

0.2.3

Jan 12, 2026

0.2.0

Oct 27, 2025

This version

0.1.0

Oct 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cua_bench-0.1.0.tar.gz (87.2 MB view details)

Uploaded Oct 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cua_bench-0.1.0-py3-none-any.whl (2.3 kB view details)

Uploaded Oct 21, 2025 Python 3

File details

Details for the file cua_bench-0.1.0.tar.gz.

File metadata

Download URL: cua_bench-0.1.0.tar.gz
Upload date: Oct 21, 2025
Size: 87.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for cua_bench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9b7b36511f4ea55c996a909821017e7b0fbfecf1276b16a1d69829ad47ae25b5`
MD5	`0c2e621527261d9406bd32046eb17c2b`
BLAKE2b-256	`a13360eafc9e492c3b47006d4e4e88be40249ff867b0442332235d9a1e041471`

See more details on using hashes here.

File details

Details for the file cua_bench-0.1.0-py3-none-any.whl.

File metadata

Download URL: cua_bench-0.1.0-py3-none-any.whl
Upload date: Oct 21, 2025
Size: 2.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for cua_bench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`549cb606a5ebe6796e278abacb038cd6ca44a0dbdfa94f95aca95dea7644e84c`
MD5	`41f982006bdf9504eeb97b66bc5ea691`
BLAKE2b-256	`3ec3589c90a3746e07ab221b73cfee313cfa120c036cc8b8e3c5c434bafd95a5`

See more details on using hashes here.

cua-bench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

cua-bench

Installation

Docker Setup (for batch processing)

Quick Start

Create an environment

CLI Usage

Install an environment

List tasks

Interact with a task

Run tasks with batch processing

Process snapshots into a training dataset for UI grounding

Programmatic Interface

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes