Full computer control for AI agents — see, click, type, scroll via MCP

These details have not been verified by PyPI

Project links

Project description

puppet-ai

Full computer control for AI agents — see, click, type, scroll via MCP.

Give any AI agent eyes and hands. puppet-ai captures the screen, reads text via native OCR, detects UI elements, and controls mouse + keyboard. Works with any app on macOS — browsers, desktop apps, games, terminals.

Why puppet-ai?

Fast — native macOS OCR in ~0.5s, 7x faster with caching
Universal — works with ANY app, not just browsers
Agent-agnostic — MCP standard, plug into any AI agent
Secure — auto-masks API keys, passwords, credit cards, emails in OCR output
Complete — 27 tools: vision + actions + system
Private — all processing on-device, no data leaves your Mac

Quick Start

1. Install

pip install puppet-agent

2. Enable Accessibility

System Settings → Privacy & Security → Accessibility → enable your terminal/IDE app.

3. Connect to your AI agent

puppet-ai is an MCP server. Connect it to any MCP-compatible agent below, then ask the agent to interact with your computer.

Integrations

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

OpenAI Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.puppet-ai]
command = "puppet-ai"
args = ["serve"]

Or via CLI:

codex mcp add puppet-ai -- puppet-ai serve

Google Gemini CLI

Add to ~/.gemini/settings.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Verify: launch gemini and run /mcp to check connection.

Google Antigravity

Via MCP settings in Antigravity, or add to your project's .antigravity/settings.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Or: Cursor Settings → Tools & MCP → Add Server.

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "puppet-ai": {
      "command": "puppet-ai",
      "args": ["serve"]
    }
  }
}

Cline (VS Code)

In VS Code, open Cline settings → MCP Servers → Add:

{
  "puppet-ai": {
    "command": "puppet-ai",
    "args": ["serve"]
  }
}

Zed

Add to Zed settings (~/.config/zed/settings.json):

{
  "context_servers": {
    "puppet-ai": {
      "command": {
        "path": "puppet-ai",
        "args": ["serve"]
      }
    }
  }
}

OpenClaw

Add to your agent's MCP config:

mcp_servers:
  puppet-ai:
    command: puppet-ai
    args: [serve]

Any MCP Client

puppet-ai speaks MCP over stdio. Spawn it as a subprocess:

import subprocess
proc = subprocess.Popen(
    ["puppet-ai", "serve"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
)
# Communicate via MCP JSON-RPC over stdin/stdout

Works with any agent that supports the Model Context Protocol.

Detailed setup guides: integrations/

Tools

Vision (see the screen)

Tool	Description
`vision_list_windows`	List all open windows (app, title, size)
`vision_read_window(app)`	Read text via OCR with bounding boxes for clicking
`vision_screenshot(app)`	Capture screenshot as base64 JPEG
`vision_get_state`	Full screen state: all windows + active window text
`vision_ui_elements(app)`	Get UI elements via Accessibility API (buttons, links, checkboxes)

Actions (control the computer)

Tool	Description
`action_click(x, y)`	Click at coordinates
`action_click_text(text, app)`	Find text on screen and click it — no coordinates needed
`action_click_and_wait(text, app)`	Click text, wait for screen to stabilize, return new state
`action_type_safe(text)`	Type text via clipboard paste (works with any keyboard layout)
`action_open_url(url)`	Open URL in browser (http/https only)
`action_scroll(amount, app)`	Scroll up/down in an app
`action_hotkey(keys)`	Keyboard shortcut (e.g. `["cmd", "c"]`)
`action_press(key)`	Press a key (enter, tab, escape, etc.)
`action_drag(...)`	Drag and drop
`action_activate_window(app)`	Bring app to front
`action_clipboard_copy(text)`	Copy to clipboard
`action_clipboard_paste()`	Paste from clipboard

System

Tool	Description
`system_check_permissions`	Check accessibility access
`system_get_screen_size`	Screen dimensions
`system_get_mouse_position`	Current cursor position
`system_unmask(reason)`	Temporarily disable PII masking
`system_mask()`	Re-enable PII masking

How It Works

AI Agent (Claude, Codex, Gemini, Cursor, Windsurf, ...)
    ↕ MCP protocol (stdio)
puppet-ai server
    ├── Vision: Apple Vision OCR + CGWindowList capture
    ├── Accessibility: AXUIElement tree (buttons, links, fields)
    ├── Actions: pyautogui (mouse, keyboard, scroll)
    └── Security: PII regex filter (API keys, cards, passwords, emails)

The loop:

Look — vision_read_window("Safari") → text + coordinates
Decide — agent plans next action
Act — action_click_text("Sign In") → clicks center of text
Verify — vision_read_window again → confirm it worked
Repeat

Features

Native macOS OCR

Uses Apple Vision Framework — no external API, no GPU needed, works offline. Supports Russian and English.

Per-Window Capture

Captures specific windows via CGWindowList without switching apps or stealing focus.

Smart Coordinates

OCR returns absolute screen coordinates with Retina scaling handled automatically.

OCR Cache

Repeated reads of unchanged windows are 7x faster. Cache auto-invalidates after any action.

PII Protection

Sensitive data is automatically masked in OCR output:

API keys: sk-1***ef
Credit cards: 4111***1111
Emails: user***com
Passwords in forms
Crypto keys

Accessibility API

Detect interactive UI elements — buttons, checkboxes, links, text fields — with exact clickable coordinates.

Built-in Agent Instructions

The MCP server includes a system prompt that teaches agents how to use all 27 tools, macOS keyboard shortcuts, and the look-decide-act-verify loop.

Security

All data stays on your Mac — no telemetry, no analytics, no external calls
PII auto-masking — API keys, credit cards, emails, passwords masked before reaching the agent
URL validation — only http:// and https:// allowed, file:// blocked
Input sanitization — app names validated to prevent injection
Browser allowlist — only known browsers accepted (Chrome, Safari, Firefox, Arc, etc.)
Failsafe — pyautogui failsafe enabled by default (move mouse to corner to abort)

Examples

import asyncio
from puppet_ai.core.capture import ScreenCapture
from puppet_ai.core.actions import DesktopActions
from puppet_ai.server.mcp import VisionPipeContext, create_all_tools

async def main():
    ctx = VisionPipeContext(
        capture=ScreenCapture(),
        actions=DesktopActions(failsafe=True),
    )
    tools = create_all_tools(ctx)

    # See what's on screen
    windows = await tools["vision_list_windows"]()
    for w in windows:
        print(f"{w['app']:20s} — {w['title'][:50]}")

    # Read a window
    page = await tools["vision_read_window"](app="Safari")
    print(page["text"][:500])

    # Click text on screen
    await tools["action_click_text"](text="Sign In", app="Safari")

    # Open a URL
    await tools["action_open_url"](url="https://example.com", browser="Safari")

asyncio.run(main())

More examples in examples/.

Configuration

# puppet-ai.yaml
ocr:
  languages: ["en", "ru"]
  mode: accurate  # or "fast"

pii:
  enabled: true
  categories: [api_keys, credit_cards, crypto_keys, emails, passwords]

capture:
  max_width: 800
  format: jpeg
  quality: 75

Presets:

puppet-ai serve --preset fast      # speed over accuracy
puppet-ai serve --preset balanced  # default
puppet-ai serve --preset quality   # max accuracy

Known Limitations

Safari — type_text and Cmd+V don't work reliably in web input fields. Safari restricts programmatic input and Accessibility API access for web content. Use Chrome for full browser automation.
Electron apps — some Electron apps (e.g. Dolphin Anty) block clipboard paste in input fields.
Safari Accessibility — returns 0 UI elements for web page content. Native macOS apps and Chrome work fine.

For full browser automation, Chrome is recommended over Safari.

Requirements

macOS 13+ (Ventura or later)
Python 3.11+
Accessibility permissions enabled

Author

Daniel Starkov

Twitter: @retardTransoff
LinkedIn: Daniel Starkov

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 4, 2026

0.1.1

Apr 3, 2026

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puppet_agent-0.2.0.tar.gz (59.9 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

puppet_agent-0.2.0-py3-none-any.whl (48.3 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file puppet_agent-0.2.0.tar.gz.

File metadata

Download URL: puppet_agent-0.2.0.tar.gz
Upload date: Apr 4, 2026
Size: 59.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for puppet_agent-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`89fcb7ab6b0624ac0b8e5c384551e3fbc56bcaaa58098701651d034661b7bb3b`
MD5	`69cf9c7fdaa59dba07bf6a23d9da4bee`
BLAKE2b-256	`559711b5ce38421c5228b021b7bd5a0c341b3ebc3941788617f1216c5e510934`

See more details on using hashes here.

File details

Details for the file puppet_agent-0.2.0-py3-none-any.whl.

File metadata

Download URL: puppet_agent-0.2.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 48.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for puppet_agent-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0e1ad970c7ad30d500bf7626f5602f5b70c1cea2880b956c6b3e5735a7f5a07`
MD5	`a68495b72c34a9df8a1908088b990e5c`
BLAKE2b-256	`b9b9d850cfae9675446b358c7ae8850811bb1f72dc29051969d546ed8ab3e20c`

See more details on using hashes here.

puppet-agent 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

puppet-ai

Why puppet-ai?

Quick Start

1. Install

2. Enable Accessibility

3. Connect to your AI agent

Integrations

Claude Code

OpenAI Codex CLI

Google Gemini CLI

Google Antigravity

Cursor

Windsurf

Cline (VS Code)

Zed

OpenClaw

Any MCP Client

Tools

Vision (see the screen)

Actions (control the computer)

System

How It Works

Features

Native macOS OCR

Per-Window Capture

Smart Coordinates

OCR Cache

PII Protection

Accessibility API

Built-in Agent Instructions

Security

Examples

Configuration

Known Limitations

Requirements

Author

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes