CUA (Computer Use) Agent for AI-driven computer interaction
Project description
cua-agent is a general Computer-Use framework with liteLLM integration for running agentic workflows on macOS, Windows, and Linux sandboxes. It provides a unified interface for computer-use agents across multiple LLM providers with advanced callback system for extensibility.
Features
- Safe Computer-Use/Tool-Use: Using Computer SDK for sandboxed desktops
- Multi-Agent Support: Anthropic Claude, OpenAI computer-use-preview, UI-TARS, Omniparser + any LLM
- Multi-API Support: Take advantage of liteLLM supporting 100+ LLMs / model APIs, including local models (
huggingface-local/,ollama_chat/,mlx/) - Cross-Platform: Works on Windows, macOS, and Linux with cloud and local computer instances
- Extensible Callbacks: Built-in support for image retention, cache control, PII anonymization, budget limits, and trajectory tracking
Install
pip install "cua-agent[all]"
# or install specific providers
pip install "cua-agent[openai]" # OpenAI computer-use-preview support
pip install "cua-agent[anthropic]" # Anthropic Claude support
pip install "cua-agent[omni]" # Omniparser + any LLM support
pip install "cua-agent[uitars]" # UI-TARS
pip install "cua-agent[uitars-mlx]" # UI-TARS + MLX support
pip install "cua-agent[uitars-hf]" # UI-TARS + Huggingface support
pip install "cua-agent[ui]" # Gradio UI support
Quick Start
import asyncio
import os
from agent import ComputerAgent
from computer import Computer
async def main():
# Set up computer instance
async with Computer(
os_type="linux",
provider_type="cloud",
name=os.getenv("CUA_CONTAINER_NAME"),
api_key=os.getenv("CUA_API_KEY")
) as computer:
# Create agent
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
only_n_most_recent_images=3,
trajectory_dir="trajectories",
max_trajectory_budget=5.0 # $5 budget limit
)
# Run agent
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
if __name__ == "__main__":
asyncio.run(main())
Supported Models
Anthropic Claude (Computer Use API)
model="anthropic/claude-3-5-sonnet-20241022"
model="anthropic/claude-3-5-sonnet-20240620"
model="anthropic/claude-opus-4-20250514"
model="anthropic/claude-sonnet-4-20250514"
OpenAI Computer Use Preview
model="openai/computer-use-preview"
UI-TARS (Local or Huggingface Inference)
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
model="ollama_chat/0000/ui-tars-1.5-7b"
Omniparser + Any LLM
model="omniparser+ollama_chat/mistral-small3.2"
model="omniparser+vertex_ai/gemini-pro"
model="omniparser+anthropic/claude-3-5-sonnet-20241022"
model="omniparser+openai/gpt-4o"
Custom Tools
Define custom tools using decorated functions:
from computer.helpers import sandboxed
@sandboxed()
def read_file(location: str) -> str:
"""Read contents of a file
Parameters
----------
location : str
Path to the file to read
Returns
-------
str
Contents of the file or error message
"""
try:
with open(location, 'r') as f:
return f.read()
except Exception as e:
return f"Error reading file: {str(e)}"
def calculate(a: int, b: int) -> int:
"""Calculate the sum of two integers"""
return a + b
# Use with agent
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer, read_file, calculate]
)
Callbacks System
agent provides a comprehensive callback system for extending functionality:
Built-in Callbacks
from agent.callbacks import (
ImageRetentionCallback,
TrajectorySaverCallback,
BudgetManagerCallback,
LoggingCallback
)
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
ImageRetentionCallback(only_n_most_recent_images=3),
TrajectorySaverCallback(trajectory_dir="trajectories"),
BudgetManagerCallback(max_budget=10.0, raise_error=True),
LoggingCallback(level=logging.INFO)
]
)
Custom Callbacks
from agent.callbacks.base import AsyncCallbackHandler
class CustomCallback(AsyncCallbackHandler):
async def on_llm_start(self, messages):
"""Preprocess messages before LLM call"""
# Add custom preprocessing logic
return messages
async def on_llm_end(self, messages):
"""Postprocess messages after LLM call"""
# Add custom postprocessing logic
return messages
async def on_usage(self, usage):
"""Track usage information"""
print(f"Tokens used: {usage.total_tokens}")
Budget Management
Control costs with built-in budget management:
# Simple budget limit
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
max_trajectory_budget=5.0 # $5 limit
)
# Advanced budget configuration
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
max_trajectory_budget={
"max_budget": 10.0,
"raise_error": True, # Raise error when exceeded
"reset_after_each_run": False # Persistent across runs
}
)
Trajectory Management
Save and replay agent conversations:
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
trajectory_dir="trajectories", # Auto-save trajectories
tools=[computer]
)
# Trajectories are saved with:
# - Complete conversation history
# - Usage statistics and costs
# - Timestamps and metadata
# - Screenshots and computer actions
Configuration Options
ComputerAgent Parameters
model: Model identifier (required)tools: List of computer objects and decorated functionscallbacks: List of callback handlers for extensibilityonly_n_most_recent_images: Limit recent images to prevent context overflowverbosity: Logging level (logging.INFO, logging.DEBUG, etc.)trajectory_dir: Directory to save conversation trajectoriesmax_retries: Maximum API call retries (default: 3)screenshot_delay: Delay between actions and screenshots (default: 0.5s)use_prompt_caching: Enable prompt caching for supported modelsmax_trajectory_budget: Budget limit configuration
Environment Variables
# Computer instance (cloud)
export CUA_CONTAINER_NAME="your-container-name"
export CUA_API_KEY="your-cua-api-key"
# LLM API keys
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENAI_API_KEY="your-openai-key"
Advanced Usage
Streaming Responses
async for result in agent.run(messages, stream=True):
# Process streaming chunks
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"], end="", flush=True)
elif item["type"] == "computer_call":
action = item["action"]
print(f"\n[Action: {action['type']}]")
Interactive Chat Loop
history = []
while True:
user_input = input("> ")
if user_input.lower() in ['quit', 'exit']:
break
history.append({"role": "user", "content": user_input})
async for result in agent.run(history):
history += result["output"]
# Display assistant responses
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
Error Handling
try:
async for result in agent.run(messages):
# Process results
pass
except BudgetExceededException:
print("Budget limit exceeded")
except Exception as e:
print(f"Agent error: {e}")
API Reference
ComputerAgent.run()
async def run(
self,
messages: Messages,
stream: bool = False,
**kwargs
) -> AsyncGenerator[Dict[str, Any], None]:
"""
Run the agent with the given messages.
Args:
messages: List of message dictionaries
stream: Whether to stream the response
**kwargs: Additional arguments
Returns:
AsyncGenerator that yields response chunks
"""
Message Format
messages = [
{
"role": "user",
"content": "Take a screenshot and describe what you see"
},
{
"role": "assistant",
"content": "I'll take a screenshot for you."
}
]
Response Format
{
"output": [
{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "I can see..."}]
},
{
"type": "computer_call",
"action": {"type": "screenshot"},
"call_id": "call_123"
},
{
"type": "computer_call_output",
"call_id": "call_123",
"output": {"image_url": "data:image/png;base64,..."}
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 75,
"total_tokens": 225,
"response_cost": 0.01,
}
}
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cua_agent-0.4.3.tar.gz.
File metadata
- Download URL: cua_agent-0.4.3.tar.gz
- Upload date:
- Size: 59.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
902c6265ea6a80bc612bcdeada04c3f0b441472ee90a974aa67ae9fad1136d68
|
|
| MD5 |
4a951843f0a4aea0ae3ca177b696ef9c
|
|
| BLAKE2b-256 |
3bc8b010d7f522ad0ed10c4cb7e524909c6cfc17a9949f1f0a51bf48094450d6
|
File details
Details for the file cua_agent-0.4.3-py3-none-any.whl.
File metadata
- Download URL: cua_agent-0.4.3-py3-none-any.whl
- Upload date:
- Size: 69.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca57fe5661f4de847d280b47896fe5ae058d6b31b7f15851cf9ec70d78f0fd0f
|
|
| MD5 |
b8c7350747ffa1dcbc02bd5c9cd26680
|
|
| BLAKE2b-256 |
506bb42273045ab03119778ad45f80b18240aa4b4ade8802b04d55bf4c53e26b
|