Cross-platform unified accessibility API for AI agents
Project description
Touchpoint
Give your AI agent eyes and hands on any desktop.
Touchpoint is a cross-platform Python library for reading and interacting with desktop UI through native accessibility APIs. One import, one API — works on Linux, macOS, and Windows, with built-in support for Chromium and Electron apps via CDP.
Instead of scraping pixels, you read the real accessibility tree: structured names, roles, states, and positions for every element on screen. Build AI agents, write UI tests, or automate workflows — all with the same API.
import touchpoint as tp
elements = tp.find("Send", role=tp.Role.BUTTON, app="Slack")
tp.click(elements[0])
Why Touchpoint?
| Screenshot / vision agents | Browser-only tools | Touchpoint | |
|---|---|---|---|
| Native desktop apps | ⚠️ pixel-based, slow | ❌ | ✅ structured access |
| Web & Electron apps | ⚠️ pixel-based, slow | ✅ | ✅ via CDP |
| Structured element data | ❌ | ✅ | ✅ names, roles, states, positions |
| Cross-platform | ✅ | ✅ | ✅ Linux, macOS, Windows |
Table of Contents
- Table of Contents
- Install
- Quick Start
- MCP Server
- Browser & Electron Apps (CDP)
- API Reference
- Architecture
- Configuration
- Development
- Status
- License
Install
Requires Python 3.10+.
pip install touchpoint
Everything is included: your platform's native backend, CDP support for browsers and Electron apps, the MCP server, and screenshot capabilities. Platform-specific dependencies are installed automatically via pip environment markers.
Platform requirements
| Platform | Backend | Requirement |
|---|---|---|
| Linux | AT-SPI2 | Install xdotool for keyboard/mouse input |
| Windows | UI Automation | None — uses built-in COM APIs |
| macOS | Accessibility (AX) | Grant permission: System Settings → Privacy & Security → Accessibility |
Quick Start
import touchpoint as tp
# Discover
apps = tp.apps() # ["Firefox", "Slack", "Terminal", ...]
windows = tp.windows() # Window objects with title, position, size
all_els = tp.elements(app="Firefox", named_only=True)
# Find
results = tp.find("Search", role=tp.Role.TEXT_FIELD, app="Firefox")
# Act
tp.set_value(results[0], "touchpoint python", replace=True)
tp.press_key("enter")
tp.hotkey("ctrl", "s") # keyboard shortcuts
# Wait for UI changes
tp.wait_for("results", app="Firefox", timeout=10)
# Screenshot
img = tp.screenshot() # full desktop → PIL.Image
img = tp.screenshot(app="Firefox") # cropped to app window
Element IDs
Every element has a unique ID like atspi:1234:1:2.0 or cdp:9222:TID:4. Action functions accept either an Element object or a bare ID string — useful for storing references across steps:
results = tp.find("Send", max_results=1)
element_id = results[0].id # "atspi:1234:1:5.2"
# later...
tp.click(element_id) # works with just the string
Output formats
Control how results are returned:
tp.elements(app="Slack", format="flat") # one compact line per element (best for LLMs)
tp.elements(app="Slack", format="tree") # indented parent/child hierarchy
tp.elements(app="Slack", format="json") # full JSON with all fields
MCP Server
Touchpoint ships an MCP server with 19 tools, ready for any MCP-compatible client.
Tools
| Category | Tools |
|---|---|
| Discovery | apps, windows, find, elements, get_element |
| Screenshot | screenshot (returns image data the LLM can see) |
| Actions | click (left/right/double), set_value, set_numeric_value, focus, action |
| Keyboard | type_text, press_key (single key or combo) |
| Mouse | mouse_move, scroll |
| Window | activate_window |
| Waiting | wait_for, wait_for_app, wait_for_window |
The MCP server includes built-in instructions that teach LLM agents how to work effectively — the orient → locate → act → verify loop, how to use find(), and how to recover from errors.
┌──────────┐
┌───▶│ ORIENT │ screenshot · apps · windows
│ └────┬─────┘
│ ▼
│ ┌──────────┐
│ │ LOCATE │ find · elements · get_element
│ └────┬─────┘
│ ▼
│ ┌──────────┐
│ │ ACT │ click · set_value · type_text · press_key
│ └────┬─────┘
│ ▼
│ ┌──────────┐
│ │ VERIFY │───▶ Done ✅
│ └────┬─────┘
│ │ not yet
└─────────┘
Client setup
Claude Desktop
Config file location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"touchpoint": {
"command": "touchpoint-mcp",
"env": {
"TOUCHPOINT_CDP_DISCOVER": "true"
}
}
}
}
If using a virtualenv, use the full path: "/path/to/venv/bin/touchpoint-mcp"
VS Code / GitHub Copilot
Add to .vscode/mcp.json in your workspace:
{
"servers": {
"touchpoint": {
"command": "touchpoint-mcp",
"env": {
"TOUCHPOINT_CDP_DISCOVER": "true"
}
}
}
}
Cursor
Create or edit ~/.cursor/mcp.json:
{
"mcpServers": {
"touchpoint": {
"command": "touchpoint-mcp",
"env": {
"TOUCHPOINT_CDP_DISCOVER": "true"
}
}
}
}
Windsurf
Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"touchpoint": {
"command": "touchpoint-mcp",
"env": {
"TOUCHPOINT_CDP_DISCOVER": "true"
}
}
}
}
Claude Code (CLI)
claude mcp add-json touchpoint --scope user '{
"command": "touchpoint-mcp",
"env": {
"TOUCHPOINT_CDP_DISCOVER": "true"
}
}'
Environment variables
| Variable | Example | Description |
|---|---|---|
TOUCHPOINT_CDP_DISCOVER |
true |
Auto-discover CDP ports from running processes |
TOUCHPOINT_CDP_PORTS |
{"Chrome": 9222} |
Explicit app-to-port mapping (JSON) |
TOUCHPOINT_CDP_APP |
Google Chrome |
Single app name (pair with _PORT) |
TOUCHPOINT_CDP_PORT |
9222 |
Single port (pair with _APP) |
TOUCHPOINT_CDP_REFRESH_INTERVAL |
5.0 |
Seconds between CDP port scans |
TOUCHPOINT_SCALE_FACTOR |
1.25 |
Display scale override (Wayland, non-standard DPI) |
Browser & Electron Apps (CDP)
Native accessibility APIs return limited data for Electron and Chromium apps (Slack, Discord, VS Code, etc.). Touchpoint's CDP backend connects via Chrome DevTools Protocol to get the full web content.
Setup
- Launch the app with a debug port:
# Linux
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/tp-chrome
# macOS
open -na "Google Chrome" --args --remote-debugging-port=9222 --user-data-dir=/tmp/tp-chrome
# Windows
start chrome --remote-debugging-port=9222 --user-data-dir=%TEMP%\tp-chrome
- Configure Touchpoint:
import touchpoint as tp
tp.configure(cdp_discover=True) # auto-discover from running processes
# or
tp.configure(cdp_ports={"Google Chrome": 9222}) # explicit mapping
- Control what you get with the
sourceparameter:
tp.elements(app="Google Chrome", source="full") # native chrome + web content (default)
tp.elements(app="Google Chrome", source="ax") # web content only (CDP accessibility tree)
tp.elements(app="Google Chrome", source="native") # native UI only (toolbar, tabs, menus)
tp.elements(app="Google Chrome", source="dom") # DOM walker (catches what AX misses)
CDP results are merged with native backend results — you get the toolbar and window controls from AT-SPI2/UIA/AX, combined with the full web page content from CDP, in a single elements() call.
API Reference
Discovery
| Function | Description |
|---|---|
tp.apps() |
List application names in the accessibility tree |
tp.windows() |
All windows with id, title, app, position, size, active state |
tp.elements(app, role, states, ...) |
UI elements, with filtering, tree mode, and formatting |
tp.element_at(x, y) |
Deepest element at screen coordinates |
tp.get_element(id) |
Fresh snapshot of a single element by ID |
Search & Wait
| Function | Description |
|---|---|
tp.find(query, app, role, ...) |
Search by name — 4-stage matching: exact → contains → word → fuzzy |
tp.wait_for(query, ...) |
Poll until elements appear (or disappear with gone=True) |
tp.wait_for_app(app, ...) |
Poll until an app appears or disappears |
tp.wait_for_window(title, ...) |
Poll until a window appears or disappears |
Actions
| Function | Description |
|---|---|
tp.click(element) |
Click via accessibility action, with coordinate fallback |
tp.double_click(element) |
Double-click |
tp.right_click(element) |
Right-click / context menu |
tp.set_value(element, text) |
Set text content (replace=True to clear first) |
tp.set_numeric_value(element, n) |
Set slider or spinbox value |
tp.focus(element) |
Move keyboard focus |
tp.action(element, name) |
Execute a raw accessibility action by name |
tp.activate_window(window) |
Bring a window to the foreground |
Input
| Function | Description |
|---|---|
tp.type_text(text) |
Type into the currently focused element |
tp.press_key(key) |
Press and release a key ("enter", "tab", "escape") |
tp.hotkey(*keys) |
Key combination (tp.hotkey("ctrl", "s")) |
tp.click_at(x, y) |
Click at screen coordinates |
tp.double_click_at(x, y) |
Double-click at coordinates |
tp.right_click_at(x, y) |
Right-click at coordinates |
tp.mouse_move(x, y) |
Move the cursor |
tp.scroll(direction, amount) |
Scroll at current cursor position |
Screenshot & Config
| Function | Description |
|---|---|
tp.screenshot(app, element, ...) |
Full desktop or cropped to app/window/element/monitor |
tp.monitor_count() |
Number of connected monitors |
tp.configure(...) |
Set runtime options (see Configuration) |
All action functions accept an Element object or a string ID. All discovery/search functions support format="flat", format="json", or format="tree" (elements only) to return pre-formatted strings instead of objects.
Architecture
┌───────────────────────────────────────────────────────┐
│ import touchpoint as tp │
│ tp.find() · tp.click() · tp.screenshot() · ... │
│ (Public API) │
├─────────────────────────┬─────────────────────────────┤
│ Backend (ABC) │ InputProvider (ABC) │
├─────────────────────────┼─────────────────────────────┤
│ AT-SPI2 (Linux) │ Xdotool (X11) │
│ UIA (Windows) │ SendInput (Win32) │
│ AX (macOS) │ CGEvent (macOS) │
│ CDP (browsers) │ CDP dispatch (Chrome) │
├─────────────────────────┴─────────────────────────────┤
│ Utilities: formatter · matcher · screenshot · scale │
└───────────────────────────────────────────────────────┘
Two-layer design:
- Backend reads the accessibility tree and runs structured actions (click, set_value, focus). Element-aware and reliable.
- InputProvider simulates raw keyboard and mouse input. Coordinate-based and element-blind. Used as an automatic fallback when a native accessibility action isn't available.
CDP runs alongside the platform backend. Their results are merged: native window chrome (toolbar, tabs, menus) from AT-SPI2/UIA/AX, plus full web content from CDP, unified under one API.
For detailed internals, see ARCHITECTURE.md.
Configuration
tp.configure(
fuzzy_threshold=0.6, # minimum match score for find() (0.0–1.0)
fallback_input=True, # use InputProvider when native actions fail
type_chunk_size=40, # split long text into chunks for typing (0 = disable)
max_elements=5000, # max elements per query
max_depth=10, # default tree depth limit
scale_factor=None, # display scale override (None = auto-detect)
cdp_ports={"Chrome": 9222}, # explicit CDP port mapping
cdp_discover=True, # auto-discover CDP ports from running processes
cdp_refresh_interval=5.0, # seconds between CDP target scans
)
Development
git clone https://github.com/Touchpoint-Labs/touchpoint.git
cd touchpoint
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
Status
Alpha — the API is functional, tested, and usable, but may change before 1.0.
| Platform | Backend | Input | CDP | Tests |
|---|---|---|---|---|
| Linux (X11) | ✅ AT-SPI2 | ✅ xdotool | ✅ | ✅ |
| Windows | ✅ UIA | ✅ SendInput | ✅ | ✅ |
| macOS | ✅ AX | ✅ CGEvent | ✅ | ✅ |
Known limitations
-
Wayland input — The Linux InputProvider uses
xdotool, which requires X11. On pure Wayland (no XWayland), keyboard/mouse simulation is unavailable. The accessibility tree and native actions still work. -
Synchronous CDP — CDP calls block on WebSocket responses. JavaScript dialogs (alert, confirm, prompt) are auto-dismissed to prevent deadlocks. An async rewrite is planned.
-
No browser navigation API — Touchpoint doesn't have built-in URL navigation. Agents can navigate by interacting with UI elements directly: find the address bar, type a URL, press Enter.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file touchpoint_py-0.1.0.tar.gz.
File metadata
- Download URL: touchpoint_py-0.1.0.tar.gz
- Upload date:
- Size: 188.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
680630b902fbcf0a0ed7cc0f84f313c87b7a7bb2b2586214afecac6e6487f48f
|
|
| MD5 |
c28325311e4f2afd07107f7a15730e0e
|
|
| BLAKE2b-256 |
324fbbe2cd8aa8098bce0184782ea99eb4cac8f70695210be541333acd4351fd
|
File details
Details for the file touchpoint_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: touchpoint_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 167.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc43b94e864289d019eada20c8174a813dcd8bbf9f506b551757803c20dcfe8e
|
|
| MD5 |
0e9223ae530fae764f428ee6911782da
|
|
| BLAKE2b-256 |
80836be023f2ef3d3a94f9bc156573817ea04e4385afb9a3c2620ab0d0d2ca81
|