Skip to main content

Open Computer Use Agent — framework for desktop and browser automation

Project description

opendesk

Give any AI agent eyes and hands on your desktop.

opendesk connects to Claude Code, Claude Desktop, Cursor, and Continue via MCP — adding screenshot, click, type, scroll, clipboard, OCR, and task recording to every conversation. Works on macOS, Linux, and Windows.

opendesk demo


Install in 2 steps

pip install 'opendesk[core,mcp]'
opendesk install

That's it. Start a Claude Code conversation and say:

"Take a screenshot" "Click the Save button" "Fill in this form" "Start recording this task"

Requires Python 3.10+


What your agent can do

Tool What it does
screenshot See the screen — with numbered boxes on every clickable element
ui Click buttons and type text by element name, no coordinates needed
mouse Pixel-level mouse control for anything ui can't reach
keyboard Type text, press keys, send hotkeys
app Open, close, and focus applications
clipboard Read and write the system clipboard
ocr Extract text from any part of the screen
learn Record a task once, replay it anytime
schedule Run any task or learned procedure on a timer

Automation — record, replay, and schedule

Record a task once, replay it forever, or run it on a schedule.

1. Record

"Start recording task expense-form"

Perform the workflow yourself. The agent captures every click, keystroke, and screenshot.

2. Replay

"Stop recording" → "Replay expense-form"

The agent re-executes the task using the current screen state — no hardcoded coordinates or paths.

3. Schedule

"Schedule expense-form to run every friday at 5pm" "Schedule a task called hourly-check to take a screenshot every hour"

Then start the background runner:

opendesk scheduler start

Timing formats: every 30m, every 2h, every day at 09:00, every friday at 17:00, or raw cron.

See docs/automation.md for the full guide.


System permissions

macOS

  • System Settings → Privacy & Security → Screen Recording — enable for your terminal app
  • System Settings → Privacy & Security → Accessibility — enable for mouse and keyboard control

Linux

# Clipboard support
sudo apt install xclip

# UI automation
sudo apt install xdotool python3-atspi

Windows

No extra permissions needed — opendesk uses Win32 APIs by default.


Other integrations

Claude Desktop

Add to your Claude Desktop config:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "opendesk": { "command": "opendesk-mcp" }
  }
}

Restart Claude Desktop.

Cursor / Continue

Point command at opendesk-mcp in your MCP config file.


Use in Python

import asyncio
from opendesk import create_registry, allow_all_context

async def main():
    registry = create_registry()
    ctx = allow_all_context()

    # Take a screenshot with numbered elements
    screenshot = registry.get("screenshot")
    result = await screenshot.execute(ctx, screenshot.Params(marks=True))
    print(result.output)

    # Click a button by name — no coordinates needed
    ui = registry.get("ui")
    await ui.execute(ctx, ui.Params(action="click", app="Notepad", title="File"))

    # Type text
    kb = registry.get("keyboard")
    await kb.execute(ctx, kb.Params(action="type", text="Hello from opendesk"))

asyncio.run(main())

Works with Anthropic SDK, OpenAI, and LangChain — see docs/integrations.md.


Installation options

pip install opendesk                         # framework only
pip install 'opendesk[core]'                 # + screen capture, mouse, keyboard
pip install 'opendesk[core,mcp]'             # + MCP server (recommended)
pip install 'opendesk[core,mcp,learn]'       # + task recording and replay
pip install 'opendesk[core,mcp,learn,schedule]'  # + scheduled tasks
pip install 'opendesk[all]'                  # everything

Platform support

Feature macOS Linux Windows
Screenshot
Mouse control
Keyboard
UI element access AppleScript AT-SPI2 / xdotool UI Automation
Clipboard pbcopy/pbpaste xclip / xsel pyperclip
OCR Vision / tesseract tesseract WinRT / tesseract
App open/close open -a xdg-open start
Task recording
Scheduled tasks

Citation

If you use opendesk in your research or project, please cite it:

@software{opendesk,
  author  = {Abraham, Abhijith Neil},
  title   = {opendesk: Open Desktop Automation Framework},
  year    = {2025},
  url     = {https://github.com/abhijithneilabraham/opendesk},
  version = {0.1.2},
  license = {MIT}
}

A CITATION.cff file is also included for tools like GitHub's "Cite this repository" button.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opendesk-0.1.2.tar.gz (57.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opendesk-0.1.2-py3-none-any.whl (68.6 kB view details)

Uploaded Python 3

File details

Details for the file opendesk-0.1.2.tar.gz.

File metadata

  • Download URL: opendesk-0.1.2.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for opendesk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bd06fc24c3f8936e80e0c2e4d822dcbe848d1214d01b912f121e1553e902a564
MD5 2b9540ded0e8083e529d5ce68fc7511e
BLAKE2b-256 c73c649016a9ef735fb5d30ff5e53c1f5c1569c21ab61463eae4730b9d1551ae

See more details on using hashes here.

File details

Details for the file opendesk-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: opendesk-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 68.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.16

File hashes

Hashes for opendesk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c6a659eb6160f7c3c14436f9bb12874026b092d47b51e80dd2cabe97cc80cfb2
MD5 0f5dc8549b44650b676ee9c4d58c8ca7
BLAKE2b-256 1370ce4c5b3413c848e2966ffae9c2aab8855bcd3e834bbdc28b907a4f5022ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page