Open Computer Use Agent — framework for desktop and browser automation

These details have not been verified by PyPI

Project links

Project description

opendesk

Open Computer Use Agent — gives any AI agent eyes and hands on your desktop.

opendesk runs as an MCP server. Install it, register it with your agent tool, and it adds screenshot, accessibility-based UI control, mouse, keyboard, clipboard, and OCR to every conversation — on macOS, Linux, and Windows.

Quickstart

pip install 'opendesk[core,mcp]'

Claude Code — one command:

claude mcp add opendesk -- opendesk-mcp

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "opendesk": { "command": "opendesk-mcp" }
  }
}

Cursor / Continue — same pattern, point command at opendesk-mcp.

That's it. Your agent can now say "take a screenshot", "click the Save button", or "type Hello World into TextEdit" and it will work.

What it adds to your agent

Tool	What it does
`ui`	Clicks buttons, types text, reads values — by element name, no pixel coordinates. Uses the platform's native accessibility tree (AppleScript / AT-SPI2 / UI Automation).
`screenshot`	Captures the screen. With `marks=True`, overlays numbered boxes on every interactive element so the agent can say "click mark 3".
`mouse`	Pixel-level mouse control with automatic Retina/HiDPI scaling. Last resort when `ui` has nothing to click.
`keyboard`	Types text (full Unicode), presses keys, sends hotkeys.
`app`	Opens, closes, and focuses applications.
`clipboard`	Reads and writes the system clipboard.
`ocr`	Extracts text from any screen region without sending to the LLM.

The agent follows a natural priority: ui first (no coordinates needed) → screenshot(marks=True) to see numbered elements → mouse as last resort for unlabelled canvas areas.

How the MCP integration works

Claude Code / Claude Desktop / Cursor / Continue
          |
          | MCP stdio
          v
     opendesk-mcp
          |
          +-- screenshot, ui, mouse, keyboard, app, clipboard, ocr

opendesk starts as a child process, speaks the MCP protocol over stdin/stdout, and the LLM client handles all tool-calling automatically. You never write tool-calling code.

Why opendesk?

MCP-first — works out of the box with any MCP client, zero glue code.
Accessibility tree first — the ui tool interacts with apps the same way a screen reader does, without pixel coordinates or Retina scaling headaches.
Framework-agnostic — also ships Anthropic SDK, OpenAI, and LangChain adapters.
Sandboxed — per-session audit log, app allow-list, screen region constraints.
Extensible — one class to add a custom tool; it appears in all integrations automatically.

Installation

# Minimal (just the framework, no hardware deps)
pip install opendesk

# Core computer use: screen capture + mouse/keyboard
pip install 'opendesk[core]'

# With MCP server support
pip install 'opendesk[core,mcp]'

# Everything
pip install 'opendesk[all]'

System dependencies

Platform	Required
macOS	Screen Recording permission (System Settings → Privacy → Screen Recording); Accessibility permission for mouse/keyboard
Linux	`xclip` for clipboard; `xdotool` or `pyatspi` for keyboard/UI
Windows	No extra system deps (uses Win32 APIs)

Quick start

import asyncio
from opendesk import create_registry, allow_all_context

async def main():
    registry = create_registry()
    ctx = allow_all_context()

    # Take a screenshot with Set-of-Marks overlay
    screenshot = registry.get("screenshot")
    result = await screenshot.execute(ctx, screenshot.Params(marks=True))
    print(result.output)   # lists all interactive elements as [1] Button "OK" ...
    # result.attachments[0].content  -> PNG bytes

    # Click a button by name — no pixel coordinates needed
    ui = registry.get("ui")
    await ui.execute(ctx, ui.Params(action="click", app="Safari", title="Go"))

    # Type text
    kb = registry.get("keyboard")
    await kb.execute(ctx, kb.Params(action="type", text="hello world"))

asyncio.run(main())

Integrations

MCP server (Claude Desktop, Continue, Cursor, ...)

Run the MCP server over stdio:

opendesk-mcp

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "opendesk": {
      "command": "opendesk-mcp"
    }
  }
}

Or create a server in Python:

from opendesk.integrations.mcp import create_mcp_server
from opendesk.registry import create_registry
from mcp.server.stdio import stdio_server

server = create_mcp_server(create_registry())
async with stdio_server() as (r, w):
    await server.run(r, w, server.create_initialization_options())

Claude Code / Anthropic SDK

import anthropic
from opendesk.integrations.claude_code import ClaudeCodeAdapter
from opendesk.registry import create_registry

client = anthropic.Anthropic()
adapter = ClaudeCodeAdapter(create_registry())

messages = [{"role": "user", "content": "Open Safari and take a screenshot"}]

# Full agentic loop (handles tool use automatically)
final_text = await adapter.run_loop(
    client,
    model="claude-opus-4-6",
    messages=messages,
    system="You are a computer use agent. Use the ui tool first, mouse as last resort.",
)
print(final_text)

Manual control:

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    tools=adapter.tool_definitions(),
    messages=messages,
)

# Dispatch all tool_use blocks in parallel
tool_results = await adapter.handle_response(response)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})

OpenAI function calling

Works with OpenAI, Groq, Together AI, Ollama, LiteLLM, and any OpenAI-compatible provider:

from openai import OpenAI
from opendesk.integrations.openai_compat import OpenAIAdapter
from opendesk.registry import create_registry

client = OpenAI()
adapter = OpenAIAdapter(create_registry())

messages = [{"role": "user", "content": "Take a screenshot"}]
final_text = await adapter.run_loop(client, model="gpt-4o", messages=messages)

LangChain / LangGraph

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

from opendesk.integrations.langchain_compat import as_langchain_tools
from opendesk.registry import create_registry

tools = as_langchain_tools(create_registry())
agent = create_react_agent(ChatOpenAI(model="gpt-4o"), tools)
result = agent.invoke({"messages": [("user", "Take a screenshot")]})

Tools

Tool	Description
`ui`	Click, type, and read UI elements by name via the accessibility tree. Use this first.
`screenshot`	Capture screen with Set-of-Marks overlay, cursor dot, zoom, and change detection
`mouse`	Click, scroll, drag — with automatic Retina/HiDPI coordinate scaling
`keyboard`	Type (full Unicode), press keys, hotkeys, hold
`app`	Open, close, focus, or list applications
`clipboard`	Read or write the system clipboard
`ocr`	Extract text from any screen region (pytesseract / Vision / WinRT)

Tool priority

When the agent needs to interact with a UI element:

ui tool — click by element title/role, no coordinates needed. Most reliable.
screenshot with marks=True — if ui doesn't find the element, get a SoM overlay showing numbered bounding boxes.
mouse with image_width/image_height — last resort for unlabelled canvas areas. Always provide image dimensions for correct Retina scaling.

Architecture

opendesk/
├── tools/          # Tool definitions (base.py + one file per tool)
├── computer/       # Low-level helpers: capture, marks (SoM), OCR, sandbox
├── integrations/   # MCP, Claude Code, OpenAI, LangChain adapters
└── registry.py     # ToolRegistry + create_registry()

See docs/architecture.md for a deep dive.

Permission model

Every tool action goes through a ToolContext.check_permission() call before execution.

from opendesk.tools.base import allow_all_context, interactive_context

# Headless / autonomous — approve everything automatically
ctx = allow_all_context()

# Interactive — prompt on stdout before each action
ctx = interactive_context()

# Custom handler — integrate with your own UI or policy engine
async def my_handler(tool: str, argument: str, description: str) -> None:
    if "production" in description.lower():
        raise PermissionDeniedError("Refusing to act on production.")

from opendesk.tools.base import ToolContext
ctx = ToolContext(session_id="my-session", permission_handler=my_handler)

Platform support

Feature	macOS	Linux	Windows
Screenshot	mss + Pillow	mss + Pillow	mss + Pillow
Mouse control	pyautogui	pyautogui	pyautogui
Keyboard (Unicode)	pbcopy + cmd+v	xclip/xsel + ctrl+v	pyperclip + ctrl+v
AX tree (`ui` tool)	AppleScript	AT-SPI2 / xdotool	pywinauto
SoM marks	AppleScript	pyatspi	pywinauto
OCR	pytesseract / Vision	pytesseract	pytesseract / WinRT
App open/close	`open -a` / AppleScript	xdg-open / pkill	start / taskkill

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

May 10, 2026

0.1.1

May 8, 2026

This version

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opendesk-0.1.0.tar.gz (55.1 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opendesk-0.1.0-py3-none-any.whl (63.8 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file opendesk-0.1.0.tar.gz.

File metadata

Download URL: opendesk-0.1.0.tar.gz
Upload date: May 8, 2026
Size: 55.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for opendesk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4ffd52c427e0a677626d145537ebb4493042ccfde5c48586d3393c75014d6cca`
MD5	`fb13a3a1fee005040b235fc00efd1f48`
BLAKE2b-256	`bb32565d4fd9a80f42770ac8bcc1e6d534a85886f6b8921104442adc0a34d0ae`

See more details on using hashes here.

File details

Details for the file opendesk-0.1.0-py3-none-any.whl.

File metadata

Download URL: opendesk-0.1.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 63.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for opendesk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`351adc4e2ec4aefb1de7972b3ecaaae69abf0ab90295320199be73c1df8fce68`
MD5	`6aae7c351d10597c2ae1f3545d1ed3b7`
BLAKE2b-256	`0496c474d19bf0faf5c75ac41d4f97ab472e06d280a8219417d5957abea01243`

See more details on using hashes here.

opendesk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

opendesk

Quickstart

What it adds to your agent

How the MCP integration works

Why opendesk?

Installation

System dependencies

Quick start

Integrations

MCP server (Claude Desktop, Continue, Cursor, ...)

Claude Code / Anthropic SDK

OpenAI function calling

LangChain / LangGraph

Tools

Tool priority

Architecture

Permission model

Platform support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes