Skip to main content

CUA (Computer Use) Agent for AI-driven computer interaction

Project description

Shows my svg

Python macOS Discord PyPI

cua-agent is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).

Get started with Agent

Install

pip install "cua-agent[all]"

# or install specific loop providers
pip install "cua-agent[openai]" # OpenAI Cua Loop
pip install "cua-agent[anthropic]" # Anthropic Cua Loop
pip install "cua-agent[omni]" # Cua Loop based on OmniParser (includes Ollama for local models)

Run

async with Computer() as macos_computer:
  # Create agent with loop and provider
  agent = ComputerAgent(
      computer=macos_computer,
      loop=AgentLoop.OPENAI,
      model=LLM(provider=LLMProvider.OPENAI)
  )

  tasks = [
      "Look for a repository named trycua/cua on GitHub.",
      "Check the open issues, open the most recent one and read it.",
      "Clone the repository in users/lume/projects if it doesn't exist yet.",
      "Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
      "From Cursor, open Composer if not already open.",
      "Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
  ]

  for i, task in enumerate(tasks):
      print(f"\nExecuting task {i}/{len(tasks)}: {task}")
      async for result in agent.run(task):
          print(result)

      print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")

Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):

Agent Loops

The cua-agent package provides three agent loops variations, based on different CUA models providers and techniques:

Agent Loop Supported Models Description Set-Of-Marks
AgentLoop.OPENAI computer_use_preview Use OpenAI Operator CUA model Not Required
AgentLoop.ANTHROPIC claude-3-5-sonnet-20240620
claude-3-7-sonnet-20250219
Use Anthropic Computer-Use Not Required
AgentLoop.OMNI
(experimental)
claude-3-5-sonnet-20240620
claude-3-7-sonnet-20250219
gpt-4.5-preview
gpt-4o
gpt-4
Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning OmniParser

AgentResponse

The AgentResponse class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new OpenAI Agent SDK specification for better consistency across different agent loops.

async for result in agent.run(task):
  print("Response ID: ", result.get("id"))

  # Print detailed usage information
  usage = result.get("usage")
  if usage:
      print("\nUsage Details:")
      print(f"  Input Tokens: {usage.get('input_tokens')}")
      if "input_tokens_details" in usage:
          print(f"  Input Tokens Details: {usage.get('input_tokens_details')}")
      print(f"  Output Tokens: {usage.get('output_tokens')}")
      if "output_tokens_details" in usage:
          print(f"  Output Tokens Details: {usage.get('output_tokens_details')}")
      print(f"  Total Tokens: {usage.get('total_tokens')}")

  print("Response Text: ", result.get("text"))

  # Print tools information
  tools = result.get("tools")
  if tools:
      print("\nTools:")
      print(tools)

  # Print reasoning and tool call outputs
  outputs = result.get("output", [])
  for output in outputs:
      output_type = output.get("type")
      if output_type == "reasoning":
          print("\nReasoning Output:")
          print(output)
      elif output_type == "computer_call":
          print("\nTool Call Output:")
          print(output)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cua_agent-0.1.22.tar.gz (83.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cua_agent-0.1.22-py3-none-any.whl (112.1 kB view details)

Uploaded Python 3

File details

Details for the file cua_agent-0.1.22.tar.gz.

File metadata

  • Download URL: cua_agent-0.1.22.tar.gz
  • Upload date:
  • Size: 83.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for cua_agent-0.1.22.tar.gz
Algorithm Hash digest
SHA256 6aed251cfc6cfb442a6acb53fadbb6e46d470784954ed7dbedabfa4787625c8b
MD5 db5ea37106a5a73e898ad6be9ea1e9e8
BLAKE2b-256 c5c0e381d6e1576967e53782a94ec48a5d3afa1f2d12199d8bb6efa97952cc07

See more details on using hashes here.

File details

Details for the file cua_agent-0.1.22-py3-none-any.whl.

File metadata

  • Download URL: cua_agent-0.1.22-py3-none-any.whl
  • Upload date:
  • Size: 112.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for cua_agent-0.1.22-py3-none-any.whl
Algorithm Hash digest
SHA256 f644df1f2ef45dcb419db9123848c83af2257a1addc969abeb739d676c022e62
MD5 71e3befdc983f4acce6a2b73b58dba59
BLAKE2b-256 2856a77641d419e53d1a373c6531911bd98f6066e608e149f49020632be5b216

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page