Skip to main content

Full computer control for AI agents — see, click, type, scroll via MCP

Project description

puppet-ai

Full computer control for AI agents — see, click, type, scroll via MCP.

Give any AI agent eyes and hands. puppet-ai captures the screen, reads text via native OCR, detects UI elements, and controls mouse + keyboard. Works with any app on macOS — browsers, desktop apps, games, terminals.

Why puppet-ai?

  • Fast — native macOS OCR in ~0.5s, 7x faster with caching
  • Universal — works with ANY app, not just browsers
  • Agent-agnostic — MCP standard, plug into any AI agent
  • Secure — auto-masks API keys, passwords, credit cards, emails in OCR output
  • Complete — 27 tools: vision + actions + system
  • Private — all processing on-device, no data leaves your Mac

Quick Start

1. Install

pip install puppet-ai

2. Enable Accessibility

System Settings → Privacy & Security → Accessibility → enable your terminal/IDE app.

3. Connect to your AI agent

puppet-ai is an MCP server. Connect it to any MCP-compatible agent below, then ask the agent to interact with your computer.


Integrations

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

OpenAI Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.puppet-ai]
command = "puppet-ai"
args = ["serve"]

Or via CLI:

codex mcp add puppet-ai -- puppet-ai serve

Google Gemini CLI

Add to ~/.gemini/settings.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Verify: launch gemini and run /mcp to check connection.

Google Antigravity

Via MCP settings in Antigravity, or add to your project's .antigravity/settings.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Or: Cursor Settings → Tools & MCP → Add Server.

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Cline (VS Code)

In VS Code, open Cline settings → MCP Servers → Add:

{
  "puppet-ai": {
    "command": "puppet-ai",
    "args": ["serve"]
  }
}

Zed

Add to Zed settings (~/.config/zed/settings.json):

{
  "context_servers": {
    "puppet-ai": {
      "command": {
        "path": "puppet-ai",
        "args": ["serve"]
      }
    }
  }
}

OpenClaw

Add to your agent's MCP config:

mcp_servers:
  puppet-ai:
    command: puppet-ai
    args: [serve]

Any MCP Client

puppet-ai speaks MCP over stdio. Spawn it as a subprocess:

import subprocess
proc = subprocess.Popen(
    ["puppet-ai", "serve"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
)
# Communicate via MCP JSON-RPC over stdin/stdout

Works with any agent that supports the Model Context Protocol.

Detailed setup guides: integrations/


Tools

Vision (see the screen)

Tool Description
vision_list_windows List all open windows (app, title, size)
vision_read_window(app) Read text via OCR with bounding boxes for clicking
vision_screenshot(app) Capture screenshot as base64 JPEG
vision_get_state Full screen state: all windows + active window text
vision_ui_elements(app) Get UI elements via Accessibility API (buttons, links, checkboxes)

Actions (control the computer)

Tool Description
action_click(x, y) Click at coordinates
action_click_text(text, app) Find text on screen and click it — no coordinates needed
action_click_and_wait(text, app) Click text, wait for screen to stabilize, return new state
action_type_safe(text) Type text via clipboard paste (works with any keyboard layout)
action_open_url(url) Open URL in browser (http/https only)
action_scroll(amount, app) Scroll up/down in an app
action_hotkey(keys) Keyboard shortcut (e.g. ["cmd", "c"])
action_press(key) Press a key (enter, tab, escape, etc.)
action_drag(...) Drag and drop
action_activate_window(app) Bring app to front
action_clipboard_copy(text) Copy to clipboard
action_clipboard_paste() Paste from clipboard

System

Tool Description
system_check_permissions Check accessibility access
system_get_screen_size Screen dimensions
system_get_mouse_position Current cursor position
system_unmask(reason) Temporarily disable PII masking
system_mask() Re-enable PII masking

How It Works

AI Agent (Claude, Codex, Gemini, Cursor, Windsurf, ...)
    ↕ MCP protocol (stdio)
puppet-ai server
    ├── Vision: Apple Vision OCR + CGWindowList capture
    ├── Accessibility: AXUIElement tree (buttons, links, fields)
    ├── Actions: pyautogui (mouse, keyboard, scroll)
    └── Security: PII regex filter (API keys, cards, passwords, emails)

The loop:

  1. Lookvision_read_window("Safari") → text + coordinates
  2. Decide — agent plans next action
  3. Actaction_click_text("Sign In") → clicks center of text
  4. Verifyvision_read_window again → confirm it worked
  5. Repeat

Features

Native macOS OCR

Uses Apple Vision Framework — no external API, no GPU needed, works offline. Supports Russian and English.

Per-Window Capture

Captures specific windows via CGWindowList without switching apps or stealing focus.

Smart Coordinates

OCR returns absolute screen coordinates with Retina scaling handled automatically.

OCR Cache

Repeated reads of unchanged windows are 7x faster. Cache auto-invalidates after any action.

PII Protection

Sensitive data is automatically masked in OCR output:

  • API keys: sk-1***ef
  • Credit cards: 4111***1111
  • Emails: user***com
  • Passwords in forms
  • Crypto keys

Accessibility API

Detect interactive UI elements — buttons, checkboxes, links, text fields — with exact clickable coordinates.

Built-in Agent Instructions

The MCP server includes a system prompt that teaches agents how to use all 27 tools, macOS keyboard shortcuts, and the look-decide-act-verify loop.

Security

  • All data stays on your Mac — no telemetry, no analytics, no external calls
  • PII auto-masking — API keys, credit cards, emails, passwords masked before reaching the agent
  • URL validation — only http:// and https:// allowed, file:// blocked
  • Input sanitization — app names validated to prevent injection
  • Browser allowlist — only known browsers accepted (Chrome, Safari, Firefox, Arc, etc.)
  • Failsafe — pyautogui failsafe enabled by default (move mouse to corner to abort)

Examples

import asyncio
from puppet_ai.core.capture import ScreenCapture
from puppet_ai.core.actions import DesktopActions
from puppet_ai.server.mcp import VisionPipeContext, create_all_tools

async def main():
    ctx = VisionPipeContext(
        capture=ScreenCapture(),
        actions=DesktopActions(failsafe=True),
    )
    tools = create_all_tools(ctx)

    # See what's on screen
    windows = await tools["vision_list_windows"]()
    for w in windows:
        print(f"{w['app']:20s}{w['title'][:50]}")

    # Read a window
    page = await tools["vision_read_window"](app="Safari")
    print(page["text"][:500])

    # Click text on screen
    await tools["action_click_text"](text="Sign In", app="Safari")

    # Open a URL
    await tools["action_open_url"](url="https://example.com", browser="Safari")

asyncio.run(main())

More examples in examples/.

Configuration

# puppet-ai.yaml
ocr:
  languages: ["en", "ru"]
  mode: accurate  # or "fast"

pii:
  enabled: true
  categories: [api_keys, credit_cards, crypto_keys, emails, passwords]

capture:
  max_width: 800
  format: jpeg
  quality: 75

Presets:

puppet-ai serve --preset fast      # speed over accuracy
puppet-ai serve --preset balanced  # default
puppet-ai serve --preset quality   # max accuracy

Requirements

  • macOS 13+ (Ventura or later)
  • Python 3.11+
  • Accessibility permissions enabled

Author

Daniel Starkov

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puppet_agent-0.1.0.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

puppet_agent-0.1.0-py3-none-any.whl (40.8 kB view details)

Uploaded Python 3

File details

Details for the file puppet_agent-0.1.0.tar.gz.

File metadata

  • Download URL: puppet_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for puppet_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97cca8ce6a2af4938b47b7ae9a0e45cbcbfda4461db9cdd8d23eaeaeb04628c1
MD5 8e43cfcffe3b06b2244176d0aaf29cfe
BLAKE2b-256 904abd4b1d36fc8c0b9524dff11c6d9368a87728bb64b40d1d5b477c5cb59594

See more details on using hashes here.

File details

Details for the file puppet_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: puppet_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 40.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for puppet_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8955fd06f50307d9ae89f15b679d81fd5a39bfb96deab2df1ab9e7d6137ce187
MD5 43f4475c91effdfd7b43f6a335850e7c
BLAKE2b-256 0a57da24c6c81a4b349e7833bd243269d25ca1e3b3f056e32924eb9547112b2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page