Skip to main content

Open Computer Use Agent — framework for desktop and browser automation

Project description

opendesk

Give any AI agent eyes and hands on your desktop.

opendesk connects to Claude Code, Claude Desktop, Cursor, and Continue via MCP — adding screenshot, click, type, scroll, clipboard, OCR, and task recording to every conversation. Works on macOS, Linux, and Windows.


Install in 2 steps

pip install 'opendesk[core,mcp]'
opendesk install

That's it. Start a Claude Code conversation and say:

"Take a screenshot" "Click the Save button" "Fill in this form" "Start recording this task"


What your agent can do

Tool What it does
screenshot See the screen — with numbered boxes on every clickable element
ui Click buttons and type text by name, no pixel coordinates needed
mouse Pixel-level mouse control for anything ui can't reach
keyboard Type text, press keys, send hotkeys
app Open, close, and focus applications
clipboard Read and write the system clipboard
ocr Extract text from any part of the screen
learn Record a task once, replay it anytime

Record and replay tasks

Teach your agent to repeat any workflow:

  1. Say "start recording task fill-expense-form"
  2. Do the task yourself
  3. Say "stop recording" — the agent summarizes it into a reusable procedure
  4. Next time, say "replay fill-expense-form" — the agent follows the steps automatically

Procedures are saved in .opendesk/learned/ in your project directory.


System permissions (macOS)

Go to System Settings → Privacy & Security and enable:

  • Screen Recording — for screenshots
  • Accessibility — for mouse and keyboard control

Other integrations

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "opendesk": { "command": "opendesk-mcp" }
  }
}

Restart Claude Desktop.

Cursor / Continue

Point command at opendesk-mcp in your MCP config.


Use in Python

import asyncio
from opendesk import create_registry, allow_all_context

async def main():
    registry = create_registry()
    ctx = allow_all_context()

    screenshot = registry.get("screenshot")
    result = await screenshot.execute(ctx, screenshot.Params(marks=True))
    print(result.output)  # lists all interactive elements

    ui = registry.get("ui")
    await ui.execute(ctx, ui.Params(action="click", app="Safari", title="Go"))

asyncio.run(main())

Works with Anthropic SDK, OpenAI, and LangChain — see docs/integrations.md.


Installation options

pip install opendesk                    # framework only
pip install 'opendesk[core]'            # + screen capture, mouse, keyboard
pip install 'opendesk[core,mcp]'        # + MCP server (recommended)
pip install 'opendesk[core,mcp,learn]'  # + task recording
pip install 'opendesk[all]'             # everything

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opendesk-0.1.1.tar.gz (50.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opendesk-0.1.1-py3-none-any.whl (61.6 kB view details)

Uploaded Python 3

File details

Details for the file opendesk-0.1.1.tar.gz.

File metadata

  • Download URL: opendesk-0.1.1.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for opendesk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 83540ea2391995d0f7edcb75c292329fb24bea459540bbb23a85e2ccf690b0c0
MD5 c7cf79090e86c5913ad5f47e3c08a875
BLAKE2b-256 235c38696a7b992913e1ae087de9f99ed521da00e3e02d3c277fa740bc8793dd

See more details on using hashes here.

File details

Details for the file opendesk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: opendesk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 61.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for opendesk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d635ba13a66af5de36e7ab8ea9382c16e9a20701536f75f00e30477c16741ec4
MD5 3466db3a64428c6382d8f1be0734ef0c
BLAKE2b-256 6f9c38c14ec4188eb711305711cdd061fb0e97b5ab1acd3dca81cceb8f1888be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page