Skip to main content

CUA (Computer Use) Agent for AI-driven computer interaction

Project description

Shows my svg

Python macOS Discord PyPI

cua-agent is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).

Get started with Agent

Install

pip install "cua-agent[all]"

# or install specific loop providers
pip install "cua-agent[openai]" # OpenAI Cua Loop
pip install "cua-agent[anthropic]" # Anthropic Cua Loop
pip install "cua-agent[uitars]"    # UI-Tars support
pip install "cua-agent[omni]" # Cua Loop based on OmniParser (includes Ollama for local models)
pip install "cua-agent[ui]" # Gradio UI for the agent
pip install "cua-agent[uitars-mlx]" # MLX UI-Tars support

Run

async with Computer() as macos_computer:
  # Create agent with loop and provider
  agent = ComputerAgent(
      computer=macos_computer,
      loop=AgentLoop.OPENAI,
      model=LLM(provider=LLMProvider.OPENAI)
      # or
      # loop=AgentLoop.ANTHROPIC,
      # model=LLM(provider=LLMProvider.ANTHROPIC)
      # or
      # loop=AgentLoop.OMNI,
      # model=LLM(provider=LLMProvider.OLLAMA, name="gemma3")
      # or
      # loop=AgentLoop.UITARS,
      # model=LLM(provider=LLMProvider.OAICOMPAT, name="ByteDance-Seed/UI-TARS-1.5-7B", provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1")
  )

  tasks = [
      "Look for a repository named trycua/cua on GitHub.",
      "Check the open issues, open the most recent one and read it.",
      "Clone the repository in users/lume/projects if it doesn't exist yet.",
      "Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
      "From Cursor, open Composer if not already open.",
      "Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
  ]

  for i, task in enumerate(tasks):
      print(f"\nExecuting task {i}/{len(tasks)}: {task}")
      async for result in agent.run(task):
          print(result)

      print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")

Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):

Using the Gradio UI

The agent includes a Gradio-based user interface for easier interaction.

To use it:

# Install with Gradio support
pip install "cua-agent[ui]"

Create a simple launcher script

# launch_ui.py
from agent.ui.gradio.app import create_gradio_ui

app = create_gradio_ui()
app.launch(share=False)

Setting up API Keys

For the Gradio UI to show available models, you need to set API keys as environment variables:

# For OpenAI models
export OPENAI_API_KEY=your_openai_key_here

# For Anthropic models
export ANTHROPIC_API_KEY=your_anthropic_key_here

# Launch with both keys set
OPENAI_API_KEY=your_key ANTHROPIC_API_KEY=your_key python launch_ui.py

Without these environment variables, the UI will show "No models available" for the corresponding providers, but you can still use local models with the OMNI loop provider.

Using Local Models

You can use local models with the OMNI loop provider by selecting "Custom model..." from the dropdown. The default provider URL is set to http://localhost:1234/v1 which works with LM Studio.

If you're using a different local model server:

  • vLLM: http://localhost:8000/v1
  • LocalAI: http://localhost:8080/v1
  • Ollama with OpenAI compat API: http://localhost:11434/v1

The Gradio UI provides:

  • Selection of different agent loops (OpenAI, Anthropic, OMNI)
  • Model selection for each provider
  • Configuration of agent parameters
  • Chat interface for interacting with the agent

Using UI-TARS

The UI-TARS models are available in two forms:

  1. MLX UI-TARS models (Default): These models run locally using MLXVLM provider

    • mlx-community/UI-TARS-1.5-7B-4bit (default) - 4-bit quantized version
    • mlx-community/UI-TARS-1.5-7B-6bit - 6-bit quantized version for higher quality
    agent = ComputerAgent(
        computer=macos_computer,
        loop=AgentLoop.UITARS,
        model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-4bit")
    )
    
  2. OpenAI-compatible UI-TARS: For using the original ByteDance model

    • If you want to use the original ByteDance UI-TARS model via an OpenAI-compatible API, follow the deployment guide
    • This will give you a provider URL like https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1 which you can use in the code or Gradio UI:
    agent = ComputerAgent(
        computer=macos_computer,
        loop=AgentLoop.UITARS,
        model=LLM(provider=LLMProvider.OAICOMPAT, name="tgi", 
                 provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1")
    )
    

Agent Loops

The cua-agent package provides three agent loops variations, based on different CUA models providers and techniques:

Agent Loop Supported Models Description Set-Of-Marks
AgentLoop.OPENAI computer_use_preview Use OpenAI Operator CUA model Not Required
AgentLoop.ANTHROPIC claude-3-5-sonnet-20240620
claude-3-7-sonnet-20250219
Use Anthropic Computer-Use Not Required
AgentLoop.UITARS mlx-community/UI-TARS-1.5-7B-4bit (default)
mlx-community/UI-TARS-1.5-7B-6bit
ByteDance-Seed/UI-TARS-1.5-7B (via openAI-compatible endpoint)
Uses UI-TARS models with MLXVLM (default) or OAICOMPAT providers Not Required
AgentLoop.OMNI claude-3-5-sonnet-20240620
claude-3-7-sonnet-20250219
gpt-4.5-preview
gpt-4o
gpt-4
phi4
phi4-mini
gemma3
...
Any Ollama or OpenAI-compatible model
Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning OmniParser

AgentResponse

The AgentResponse class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new OpenAI Agent SDK specification for better consistency across different agent loops.

async for result in agent.run(task):
  print("Response ID: ", result.get("id"))

  # Print detailed usage information
  usage = result.get("usage")
  if usage:
      print("\nUsage Details:")
      print(f"  Input Tokens: {usage.get('input_tokens')}")
      if "input_tokens_details" in usage:
          print(f"  Input Tokens Details: {usage.get('input_tokens_details')}")
      print(f"  Output Tokens: {usage.get('output_tokens')}")
      if "output_tokens_details" in usage:
          print(f"  Output Tokens Details: {usage.get('output_tokens_details')}")
      print(f"  Total Tokens: {usage.get('total_tokens')}")

  print("Response Text: ", result.get("text"))

  # Print tools information
  tools = result.get("tools")
  if tools:
      print("\nTools:")
      print(tools)

  # Print reasoning and tool call outputs
  outputs = result.get("output", [])
  for output in outputs:
      output_type = output.get("type")
      if output_type == "reasoning":
          print("\nReasoning Output:")
          print(output)
      elif output_type == "computer_call":
          print("\nTool Call Output:")
          print(output)

Note on Settings Persistence:

  • The Gradio UI automatically saves your configuration (Agent Loop, Model Choice, Custom Base URL, Save Trajectory state, Recent Images count) to a file named .gradio_settings.json in the project's root directory when you successfully run a task.
  • This allows your preferences to persist between sessions.
  • API keys entered into the custom provider field are not saved in this file for security reasons. Manage API keys using environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) or a .env file.
  • It's recommended to add .gradio_settings.json to your .gitignore file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cua_agent-0.3.2.tar.gz (116.3 kB view details)

Uploaded Source

Built Distribution

cua_agent-0.3.2-py3-none-any.whl (151.4 kB view details)

Uploaded Python 3

File details

Details for the file cua_agent-0.3.2.tar.gz.

File metadata

  • Download URL: cua_agent-0.3.2.tar.gz
  • Upload date:
  • Size: 116.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for cua_agent-0.3.2.tar.gz
Algorithm Hash digest
SHA256 247fb506fc11bc1dccb01e75ba51944691fe3fb4bcde1f6bc0b4f8403960ac44
MD5 b5bfa6ac5d6f464d560b5877c7db646c
BLAKE2b-256 2ec59782edefc93caccadee63e2835c812e8ac90aabe71fa196b5c19a03d180c

See more details on using hashes here.

File details

Details for the file cua_agent-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: cua_agent-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 151.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for cua_agent-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fa45c9e112e47981db918c5aba623883736a96d895d6b64a9833881d6ef2ad00
MD5 07106c1b5446b1a66d5a7d99253dcd63
BLAKE2b-256 03a47d706cecfd57a02aa91f8eb402113c83bda4bfd75df53487c0817aa09085

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page