Full computer control for AI agents — see, click, type, scroll via MCP
Project description
puppet-ai
Full computer control for AI agents — see, click, type, scroll via MCP.
Give any AI agent eyes and hands. puppet-ai captures the screen, reads text via native OCR, detects UI elements, and controls mouse + keyboard. Works with any app on macOS — browsers, desktop apps, games, terminals.
Why puppet-ai?
- Fast — native macOS OCR in ~0.5s, 7x faster with caching
- Universal — works with ANY app, not just browsers
- Agent-agnostic — MCP standard, plug into any AI agent
- Secure — auto-masks API keys, passwords, credit cards, emails in OCR output
- Complete — 27 tools: vision + actions + system
- Private — all processing on-device, no data leaves your Mac
Quick Start
1. Install
pip install puppet-agent
2. Enable Accessibility
System Settings → Privacy & Security → Accessibility → enable your terminal/IDE app.
3. Connect to your AI agent
puppet-ai is an MCP server. Connect it to any MCP-compatible agent below, then ask the agent to interact with your computer.
Integrations
Claude Code
Add to ~/.claude/settings.json:
{
"mcpServers": {
"puppet-ai": {
"command": "puppet-ai",
"args": ["serve"]
}
}
}
OpenAI Codex CLI
Add to ~/.codex/config.toml:
[mcp_servers.puppet-ai]
command = "puppet-ai"
args = ["serve"]
Or via CLI:
codex mcp add puppet-ai -- puppet-ai serve
Google Gemini CLI
Add to ~/.gemini/settings.json:
{
"mcpServers": {
"puppet-ai": {
"command": "puppet-ai",
"args": ["serve"]
}
}
}
Verify: launch gemini and run /mcp to check connection.
Google Antigravity
Via MCP settings in Antigravity, or add to your project's .antigravity/settings.json:
{
"mcpServers": {
"puppet-ai": {
"command": "puppet-ai",
"args": ["serve"]
}
}
}
Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"puppet-ai": {
"command": "puppet-ai",
"args": ["serve"]
}
}
}
Or: Cursor Settings → Tools & MCP → Add Server.
Windsurf
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"puppet-ai": {
"command": "puppet-ai",
"args": ["serve"]
}
}
}
Cline (VS Code)
In VS Code, open Cline settings → MCP Servers → Add:
{
"puppet-ai": {
"command": "puppet-ai",
"args": ["serve"]
}
}
Zed
Add to Zed settings (~/.config/zed/settings.json):
{
"context_servers": {
"puppet-ai": {
"command": {
"path": "puppet-ai",
"args": ["serve"]
}
}
}
}
OpenClaw
Add to your agent's MCP config:
mcp_servers:
puppet-ai:
command: puppet-ai
args: [serve]
Any MCP Client
puppet-ai speaks MCP over stdio. Spawn it as a subprocess:
import subprocess
proc = subprocess.Popen(
["puppet-ai", "serve"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
# Communicate via MCP JSON-RPC over stdin/stdout
Works with any agent that supports the Model Context Protocol.
Detailed setup guides: integrations/
Tools
Vision (see the screen)
| Tool | Description |
|---|---|
vision_list_windows |
List all open windows (app, title, size) |
vision_read_window(app) |
Read text via OCR with bounding boxes for clicking |
vision_screenshot(app) |
Capture screenshot as base64 JPEG |
vision_get_state |
Full screen state: all windows + active window text |
vision_ui_elements(app) |
Get UI elements via Accessibility API (buttons, links, checkboxes) |
Actions (control the computer)
| Tool | Description |
|---|---|
action_click(x, y) |
Click at coordinates |
action_click_text(text, app) |
Find text on screen and click it — no coordinates needed |
action_click_and_wait(text, app) |
Click text, wait for screen to stabilize, return new state |
action_type_safe(text) |
Type text via clipboard paste (works with any keyboard layout) |
action_open_url(url) |
Open URL in browser (http/https only) |
action_scroll(amount, app) |
Scroll up/down in an app |
action_hotkey(keys) |
Keyboard shortcut (e.g. ["cmd", "c"]) |
action_press(key) |
Press a key (enter, tab, escape, etc.) |
action_drag(...) |
Drag and drop |
action_activate_window(app) |
Bring app to front |
action_clipboard_copy(text) |
Copy to clipboard |
action_clipboard_paste() |
Paste from clipboard |
System
| Tool | Description |
|---|---|
system_check_permissions |
Check accessibility access |
system_get_screen_size |
Screen dimensions |
system_get_mouse_position |
Current cursor position |
system_unmask(reason) |
Temporarily disable PII masking |
system_mask() |
Re-enable PII masking |
How It Works
AI Agent (Claude, Codex, Gemini, Cursor, Windsurf, ...)
↕ MCP protocol (stdio)
puppet-ai server
├── Vision: Apple Vision OCR + CGWindowList capture
├── Accessibility: AXUIElement tree (buttons, links, fields)
├── Actions: pyautogui (mouse, keyboard, scroll)
└── Security: PII regex filter (API keys, cards, passwords, emails)
The loop:
- Look —
vision_read_window("Safari")→ text + coordinates - Decide — agent plans next action
- Act —
action_click_text("Sign In")→ clicks center of text - Verify —
vision_read_windowagain → confirm it worked - Repeat
Features
Native macOS OCR
Uses Apple Vision Framework — no external API, no GPU needed, works offline. Supports Russian and English.
Per-Window Capture
Captures specific windows via CGWindowList without switching apps or stealing focus.
Smart Coordinates
OCR returns absolute screen coordinates with Retina scaling handled automatically.
OCR Cache
Repeated reads of unchanged windows are 7x faster. Cache auto-invalidates after any action.
PII Protection
Sensitive data is automatically masked in OCR output:
- API keys:
sk-1***ef - Credit cards:
4111***1111 - Emails:
user***com - Passwords in forms
- Crypto keys
Accessibility API
Detect interactive UI elements — buttons, checkboxes, links, text fields — with exact clickable coordinates.
Built-in Agent Instructions
The MCP server includes a system prompt that teaches agents how to use all 27 tools, macOS keyboard shortcuts, and the look-decide-act-verify loop.
Security
- All data stays on your Mac — no telemetry, no analytics, no external calls
- PII auto-masking — API keys, credit cards, emails, passwords masked before reaching the agent
- URL validation — only
http://andhttps://allowed,file://blocked - Input sanitization — app names validated to prevent injection
- Browser allowlist — only known browsers accepted (Chrome, Safari, Firefox, Arc, etc.)
- Failsafe — pyautogui failsafe enabled by default (move mouse to corner to abort)
Examples
import asyncio
from puppet_ai.core.capture import ScreenCapture
from puppet_ai.core.actions import DesktopActions
from puppet_ai.server.mcp import VisionPipeContext, create_all_tools
async def main():
ctx = VisionPipeContext(
capture=ScreenCapture(),
actions=DesktopActions(failsafe=True),
)
tools = create_all_tools(ctx)
# See what's on screen
windows = await tools["vision_list_windows"]()
for w in windows:
print(f"{w['app']:20s} — {w['title'][:50]}")
# Read a window
page = await tools["vision_read_window"](app="Safari")
print(page["text"][:500])
# Click text on screen
await tools["action_click_text"](text="Sign In", app="Safari")
# Open a URL
await tools["action_open_url"](url="https://example.com", browser="Safari")
asyncio.run(main())
More examples in examples/.
Configuration
# puppet-ai.yaml
ocr:
languages: ["en", "ru"]
mode: accurate # or "fast"
pii:
enabled: true
categories: [api_keys, credit_cards, crypto_keys, emails, passwords]
capture:
max_width: 800
format: jpeg
quality: 75
Presets:
puppet-ai serve --preset fast # speed over accuracy
puppet-ai serve --preset balanced # default
puppet-ai serve --preset quality # max accuracy
Requirements
- macOS 13+ (Ventura or later)
- Python 3.11+
- Accessibility permissions enabled
Author
Daniel Starkov
- Twitter: @retardTransoff
- LinkedIn: Daniel Starkov
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file puppet_agent-0.1.1.tar.gz.
File metadata
- Download URL: puppet_agent-0.1.1.tar.gz
- Upload date:
- Size: 41.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
561c5698480649f0a86fe05542e308b94af374834b3e3e73aa679a6c496e58af
|
|
| MD5 |
b499c0c01be95f840caca4b39eb6d842
|
|
| BLAKE2b-256 |
2f565ffbcd239537e24783ff98e5faecc0008f8e09ac4ce74fa7ee6c2a6c0019
|
File details
Details for the file puppet_agent-0.1.1-py3-none-any.whl.
File metadata
- Download URL: puppet_agent-0.1.1-py3-none-any.whl
- Upload date:
- Size: 40.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ff9f22896652878b0e288fe0a3366805e5e2b8eba1b3d076c1dfd86d61a5243
|
|
| MD5 |
b4cdd5c07534bc70fc326fad2da5b242
|
|
| BLAKE2b-256 |
040cc9b74cf9bf925f22b2e852aa29c792253ee56d6768dcb8b90b8d2f14409f
|