Skip to main content

Cross-platform unified accessibility API for AI agents

Project description

Touchpoint

Give your AI agent eyes and hands on any desktop.

PyPI Python MIT License Alpha


Touchpoint is a cross-platform Python library for reading and interacting with desktop UI through native accessibility APIs. One import, one API — works on Linux, macOS, and Windows, with built-in support for Chromium and Electron apps via CDP.

Instead of scraping pixels, you read the real accessibility tree: structured names, roles, states, and positions for every element on screen. Build AI agents, write UI tests, or automate workflows — all with the same API.

import touchpoint as tp

elements = tp.find("Send", role=tp.Role.BUTTON, app="Slack")
tp.click(elements[0])

Why Touchpoint?

Screenshot / vision agents Browser-only tools Touchpoint
Native desktop apps ⚠️ pixel-based, slow ✅ structured access
Web & Electron apps ⚠️ pixel-based, slow ✅ via CDP
Structured element data ✅ names, roles, states, positions
Cross-platform ✅ Linux, macOS, Windows

Table of Contents


Install

Requires Python 3.10+.

pip install touchpoint

Everything is included: your platform's native backend, CDP support for browsers and Electron apps, the MCP server, and screenshot capabilities. Platform-specific dependencies are installed automatically via pip environment markers.

Platform requirements

Platform Backend Requirement
Linux AT-SPI2 Install xdotool for keyboard/mouse input
Windows UI Automation None — uses built-in COM APIs
macOS Accessibility (AX) Grant permission: System Settings → Privacy & Security → Accessibility

Quick Start

import touchpoint as tp

# Discover
apps = tp.apps()                            # ["Firefox", "Slack", "Terminal", ...]
windows = tp.windows()                      # Window objects with title, position, size
all_els = tp.elements(app="Firefox", named_only=True)

# Find
results = tp.find("Search", role=tp.Role.TEXT_FIELD, app="Firefox")

# Act
tp.set_value(results[0], "touchpoint python", replace=True)
tp.press_key("enter")
tp.hotkey("ctrl", "s")                      # keyboard shortcuts

# Wait for UI changes
tp.wait_for("results", app="Firefox", timeout=10)

# Screenshot
img = tp.screenshot()                       # full desktop → PIL.Image
img = tp.screenshot(app="Firefox")           # cropped to app window

Element IDs

Every element has a unique ID like atspi:1234:1:2.0 or cdp:9222:TID:4. Action functions accept either an Element object or a bare ID string — useful for storing references across steps:

results = tp.find("Send", max_results=1)
element_id = results[0].id                  # "atspi:1234:1:5.2"

# later...
tp.click(element_id)                        # works with just the string

Output formats

Control how results are returned:

tp.elements(app="Slack", format="flat")     # one compact line per element (best for LLMs)
tp.elements(app="Slack", format="tree")     # indented parent/child hierarchy
tp.elements(app="Slack", format="json")     # full JSON with all fields

MCP Server

Touchpoint ships an MCP server with 19 tools, ready for any MCP-compatible client.

Tools

Category Tools
Discovery apps, windows, find, elements, get_element
Screenshot screenshot (returns image data the LLM can see)
Actions click (left/right/double), set_value, set_numeric_value, focus, action
Keyboard type_text, press_key (single key or combo)
Mouse mouse_move, scroll
Window activate_window
Waiting wait_for, wait_for_app, wait_for_window

The MCP server includes built-in instructions that teach LLM agents how to work effectively — the orient → locate → act → verify loop, how to use find(), and how to recover from errors.

         ┌──────────┐
    ┌───▶│  ORIENT  │  screenshot · apps · windows
    │    └────┬─────┘
    │         ▼
    │    ┌──────────┐
    │    │  LOCATE  │  find · elements · get_element
    │    └────┬─────┘
    │         ▼
    │    ┌──────────┐
    │    │   ACT    │  click · set_value · type_text · press_key
    │    └────┬─────┘
    │         ▼
    │    ┌──────────┐
    │    │  VERIFY  │───▶ Done ✅
    │    └────┬─────┘
    │         │ not yet
    └─────────┘

Client setup

Claude Desktop

Config file location:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "touchpoint": {
      "command": "touchpoint-mcp",
      "env": {
        "TOUCHPOINT_CDP_DISCOVER": "true"
      }
    }
  }
}

If using a virtualenv, use the full path: "/path/to/venv/bin/touchpoint-mcp"

VS Code / GitHub Copilot

Add to .vscode/mcp.json in your workspace:

{
  "servers": {
    "touchpoint": {
      "command": "touchpoint-mcp",
      "env": {
        "TOUCHPOINT_CDP_DISCOVER": "true"
      }
    }
  }
}
Cursor

Create or edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "touchpoint": {
      "command": "touchpoint-mcp",
      "env": {
        "TOUCHPOINT_CDP_DISCOVER": "true"
      }
    }
  }
}
Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "touchpoint": {
      "command": "touchpoint-mcp",
      "env": {
        "TOUCHPOINT_CDP_DISCOVER": "true"
      }
    }
  }
}
Claude Code (CLI)
claude mcp add-json touchpoint --scope user '{
  "command": "touchpoint-mcp",
  "env": {
    "TOUCHPOINT_CDP_DISCOVER": "true"
  }
}'

Environment variables

Variable Example Description
TOUCHPOINT_CDP_DISCOVER true Auto-discover CDP ports from running processes
TOUCHPOINT_CDP_PORTS {"Chrome": 9222} Explicit app-to-port mapping (JSON)
TOUCHPOINT_CDP_APP Google Chrome Single app name (pair with _PORT)
TOUCHPOINT_CDP_PORT 9222 Single port (pair with _APP)
TOUCHPOINT_CDP_REFRESH_INTERVAL 5.0 Seconds between CDP port scans
TOUCHPOINT_SCALE_FACTOR 1.25 Display scale override (Wayland, non-standard DPI)

Browser & Electron Apps (CDP)

Native accessibility APIs return limited data for Electron and Chromium apps (Slack, Discord, VS Code, etc.). Touchpoint's CDP backend connects via Chrome DevTools Protocol to get the full web content.

Setup

  1. Launch the app with a debug port:
# Linux
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/tp-chrome

# macOS
open -na "Google Chrome" --args --remote-debugging-port=9222 --user-data-dir=/tmp/tp-chrome

# Windows
start chrome --remote-debugging-port=9222 --user-data-dir=%TEMP%\tp-chrome
  1. Configure Touchpoint:
import touchpoint as tp

tp.configure(cdp_discover=True)             # auto-discover from running processes
# or
tp.configure(cdp_ports={"Google Chrome": 9222})  # explicit mapping
  1. Control what you get with the source parameter:
tp.elements(app="Google Chrome", source="full")     # native chrome + web content (default)
tp.elements(app="Google Chrome", source="ax")       # web content only (CDP accessibility tree)
tp.elements(app="Google Chrome", source="native")   # native UI only (toolbar, tabs, menus)
tp.elements(app="Google Chrome", source="dom")      # DOM walker (catches what AX misses)

CDP results are merged with native backend results — you get the toolbar and window controls from AT-SPI2/UIA/AX, combined with the full web page content from CDP, in a single elements() call.


API Reference

Discovery

Function Description
tp.apps() List application names in the accessibility tree
tp.windows() All windows with id, title, app, position, size, active state
tp.elements(app, role, states, ...) UI elements, with filtering, tree mode, and formatting
tp.element_at(x, y) Deepest element at screen coordinates
tp.get_element(id) Fresh snapshot of a single element by ID

Search & Wait

Function Description
tp.find(query, app, role, ...) Search by name — 4-stage matching: exact → contains → word → fuzzy
tp.wait_for(query, ...) Poll until elements appear (or disappear with gone=True)
tp.wait_for_app(app, ...) Poll until an app appears or disappears
tp.wait_for_window(title, ...) Poll until a window appears or disappears

Actions

Function Description
tp.click(element) Click via accessibility action, with coordinate fallback
tp.double_click(element) Double-click
tp.right_click(element) Right-click / context menu
tp.set_value(element, text) Set text content (replace=True to clear first)
tp.set_numeric_value(element, n) Set slider or spinbox value
tp.focus(element) Move keyboard focus
tp.action(element, name) Execute a raw accessibility action by name
tp.activate_window(window) Bring a window to the foreground

Input

Function Description
tp.type_text(text) Type into the currently focused element
tp.press_key(key) Press and release a key ("enter", "tab", "escape")
tp.hotkey(*keys) Key combination (tp.hotkey("ctrl", "s"))
tp.click_at(x, y) Click at screen coordinates
tp.double_click_at(x, y) Double-click at coordinates
tp.right_click_at(x, y) Right-click at coordinates
tp.mouse_move(x, y) Move the cursor
tp.scroll(direction, amount) Scroll at current cursor position

Screenshot & Config

Function Description
tp.screenshot(app, element, ...) Full desktop or cropped to app/window/element/monitor
tp.monitor_count() Number of connected monitors
tp.configure(...) Set runtime options (see Configuration)

All action functions accept an Element object or a string ID. All discovery/search functions support format="flat", format="json", or format="tree" (elements only) to return pre-formatted strings instead of objects.


Architecture

┌───────────────────────────────────────────────────────┐
│               import touchpoint as tp                 │
│  tp.find() · tp.click() · tp.screenshot() · ...       │
│                    (Public API)                       │
├─────────────────────────┬─────────────────────────────┤
│     Backend (ABC)       │    InputProvider (ABC)      │
├─────────────────────────┼─────────────────────────────┤
│  AT-SPI2     (Linux)    │  Xdotool       (X11)        │
│  UIA         (Windows)  │  SendInput     (Win32)      │
│  AX          (macOS)    │  CGEvent       (macOS)      │
│  CDP         (browsers) │  CDP dispatch  (Chrome)     │
├─────────────────────────┴─────────────────────────────┤
│  Utilities: formatter · matcher · screenshot · scale  │
└───────────────────────────────────────────────────────┘

Two-layer design:

  • Backend reads the accessibility tree and runs structured actions (click, set_value, focus). Element-aware and reliable.
  • InputProvider simulates raw keyboard and mouse input. Coordinate-based and element-blind. Used as an automatic fallback when a native accessibility action isn't available.

CDP runs alongside the platform backend. Their results are merged: native window chrome (toolbar, tabs, menus) from AT-SPI2/UIA/AX, plus full web content from CDP, unified under one API.

For detailed internals, see ARCHITECTURE.md.


Configuration

tp.configure(
    fuzzy_threshold=0.6,          # minimum match score for find() (0.0–1.0)
    fallback_input=True,          # use InputProvider when native actions fail
    type_chunk_size=40,           # split long text into chunks for typing (0 = disable)
    max_elements=5000,            # max elements per query
    max_depth=10,                 # default tree depth limit
    scale_factor=None,            # display scale override (None = auto-detect)
    cdp_ports={"Chrome": 9222},   # explicit CDP port mapping
    cdp_discover=True,            # auto-discover CDP ports from running processes
    cdp_refresh_interval=5.0,     # seconds between CDP target scans
)

Development

git clone https://github.com/Touchpoint-Labs/touchpoint.git
cd touchpoint
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Status

Alpha — the API is functional, tested, and usable, but may change before 1.0.

Platform Backend Input CDP Tests
Linux (X11) ✅ AT-SPI2 ✅ xdotool
Windows ✅ UIA ✅ SendInput
macOS ✅ AX ✅ CGEvent

Known limitations

  • Wayland input — The Linux InputProvider uses xdotool, which requires X11. On pure Wayland (no XWayland), keyboard/mouse simulation is unavailable. The accessibility tree and native actions still work.

  • Synchronous CDP — CDP calls block on WebSocket responses. JavaScript dialogs (alert, confirm, prompt) are auto-dismissed to prevent deadlocks. An async rewrite is planned.

  • No browser navigation API — Touchpoint doesn't have built-in URL navigation. Agents can navigate by interacting with UI elements directly: find the address bar, type a URL, press Enter.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

touchpoint_py-0.1.0.tar.gz (188.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

touchpoint_py-0.1.0-py3-none-any.whl (167.6 kB view details)

Uploaded Python 3

File details

Details for the file touchpoint_py-0.1.0.tar.gz.

File metadata

  • Download URL: touchpoint_py-0.1.0.tar.gz
  • Upload date:
  • Size: 188.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for touchpoint_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 680630b902fbcf0a0ed7cc0f84f313c87b7a7bb2b2586214afecac6e6487f48f
MD5 c28325311e4f2afd07107f7a15730e0e
BLAKE2b-256 324fbbe2cd8aa8098bce0184782ea99eb4cac8f70695210be541333acd4351fd

See more details on using hashes here.

File details

Details for the file touchpoint_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: touchpoint_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 167.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for touchpoint_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc43b94e864289d019eada20c8174a813dcd8bbf9f506b551757803c20dcfe8e
MD5 0e9223ae530fae764f428ee6911782da
BLAKE2b-256 80836be023f2ef3d3a94f9bc156573817ea04e4385afb9a3c2620ab0d0d2ca81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page