Open Computer Use Agent — framework for desktop and browser automation
Project description
opendesk
Give any AI agent eyes and hands on your desktop.
opendesk connects to Claude Code, Claude Desktop, Cursor, and Continue via MCP — adding screenshot, click, type, scroll, clipboard, OCR, and task recording to every conversation. Works on macOS, Linux, and Windows.
Install in 2 steps
pip install 'opendesk[core,mcp]'
opendesk install
That's it. Start a Claude Code conversation and say:
"Take a screenshot" "Click the Save button" "Fill in this form" "Start recording this task"
Requires Python 3.10+
What your agent can do
| Tool | What it does |
|---|---|
screenshot |
See the screen — with numbered boxes on every clickable element |
ui |
Click buttons and type text by element name, no coordinates needed |
mouse |
Pixel-level mouse control for anything ui can't reach |
keyboard |
Type text, press keys, send hotkeys |
app |
Open, close, and focus applications |
clipboard |
Read and write the system clipboard |
ocr |
Extract text from any part of the screen |
learn |
Record a task once, replay it anytime |
schedule |
Run any task or learned procedure on a timer |
Automation — record, replay, and schedule
Record a task once, replay it forever, or run it on a schedule.
1. Record
"Start recording task expense-form"
Perform the workflow yourself. The agent captures every click, keystroke, and screenshot.
2. Replay
"Stop recording" → "Replay expense-form"
The agent re-executes the task using the current screen state — no hardcoded coordinates or paths.
3. Schedule
"Schedule expense-form to run every friday at 5pm" "Schedule a task called hourly-check to take a screenshot every hour"
Then start the background runner:
opendesk scheduler start
Timing formats: every 30m, every 2h, every day at 09:00, every friday at 17:00, or raw cron.
See docs/automation.md for the full guide.
System permissions
macOS
- System Settings → Privacy & Security → Screen Recording — enable for your terminal app
- System Settings → Privacy & Security → Accessibility — enable for mouse and keyboard control
Linux
# Clipboard support
sudo apt install xclip
# UI automation
sudo apt install xdotool python3-atspi
Windows
No extra permissions needed — opendesk uses Win32 APIs by default.
Other integrations
Claude Desktop
Add to your Claude Desktop config:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"opendesk": { "command": "opendesk-mcp" }
}
}
Restart Claude Desktop.
Cursor / Continue
Point command at opendesk-mcp in your MCP config file.
Use in Python
import asyncio
from opendesk import create_registry, allow_all_context
async def main():
registry = create_registry()
ctx = allow_all_context()
# Take a screenshot with numbered elements
screenshot = registry.get("screenshot")
result = await screenshot.execute(ctx, screenshot.Params(marks=True))
print(result.output)
# Click a button by name — no coordinates needed
ui = registry.get("ui")
await ui.execute(ctx, ui.Params(action="click", app="Notepad", title="File"))
# Type text
kb = registry.get("keyboard")
await kb.execute(ctx, kb.Params(action="type", text="Hello from opendesk"))
asyncio.run(main())
Works with Anthropic SDK, OpenAI, and LangChain — see docs/integrations.md.
Installation options
pip install opendesk # framework only
pip install 'opendesk[core]' # + screen capture, mouse, keyboard
pip install 'opendesk[core,mcp]' # + MCP server (recommended)
pip install 'opendesk[core,mcp,learn]' # + task recording and replay
pip install 'opendesk[core,mcp,learn,schedule]' # + scheduled tasks
pip install 'opendesk[all]' # everything
Platform support
| Feature | macOS | Linux | Windows |
|---|---|---|---|
| Screenshot | ✓ | ✓ | ✓ |
| Mouse control | ✓ | ✓ | ✓ |
| Keyboard | ✓ | ✓ | ✓ |
| UI element access | AppleScript | AT-SPI2 / xdotool | UI Automation |
| Clipboard | pbcopy/pbpaste | xclip / xsel | pyperclip |
| OCR | Vision / tesseract | tesseract | WinRT / tesseract |
| App open/close | open -a | xdg-open | start |
| Task recording | ✓ | ✓ | ✓ |
| Scheduled tasks | ✓ | ✓ | ✓ |
Citation
If you use opendesk in your research or project, please cite it:
@software{opendesk,
author = {Abraham, Abhijith Neil},
title = {opendesk: Open Desktop Automation Framework},
year = {2025},
url = {https://github.com/abhijithneilabraham/opendesk},
version = {0.1.2},
license = {MIT}
}
A CITATION.cff file is also included for tools like GitHub's "Cite this repository" button.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opendesk-0.1.2.tar.gz.
File metadata
- Download URL: opendesk-0.1.2.tar.gz
- Upload date:
- Size: 57.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd06fc24c3f8936e80e0c2e4d822dcbe848d1214d01b912f121e1553e902a564
|
|
| MD5 |
2b9540ded0e8083e529d5ce68fc7511e
|
|
| BLAKE2b-256 |
c73c649016a9ef735fb5d30ff5e53c1f5c1569c21ab61463eae4730b9d1551ae
|
File details
Details for the file opendesk-0.1.2-py3-none-any.whl.
File metadata
- Download URL: opendesk-0.1.2-py3-none-any.whl
- Upload date:
- Size: 68.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6a659eb6160f7c3c14436f9bb12874026b092d47b51e80dd2cabe97cc80cfb2
|
|
| MD5 |
0f5dc8549b44650b676ee9c4d58c8ca7
|
|
| BLAKE2b-256 |
1370ce4c5b3413c848e2966ffae9c2aab8855bcd3e834bbdc28b907a4f5022ae
|