Drive a GNOME Wayland desktop from AI agents. CLI-first, no portals, no consent dialogs.
Project description
tine
Drive a GNOME Wayland desktop from AI agents. CLI-first, no portals, no consent dialogs.
Tine is a command-line bridge between an AI coding agent (Claude Code, Codex, etc.) and a running Linux desktop. It reads the screen, walks the accessibility tree, and injects keyboard and mouse events at the kernel level — no Wayland portal dialogs, no per-action consent prompts, no X11 fallback hacks.
$ tine describe
[screenshot + AT-SPI2 tree summary]
$ tine click ref_17 # click by accessibility-tree ref
$ tine click B3 # click by labeled grid cell
$ tine type "hello, world" # kernel-level key injection
$ tine key ctrl+t # modifier combos
$ tine focus Firefox # raise + focus a window
Status: alpha. Tested on GNOME 49 Wayland / Arch Linux. API may change before 1.0.
Why
Anthropic's computer-use feature works on Windows and macOS. If you use Linux — and especially if you use Wayland, which is more locked-down than X11 and breaks most of the existing Linux automation stack — you're mostly out of luck. Tine is an attempt at a usable Wayland alternative.
It reads the screen three ways:
- AT-SPI2 accessibility tree. When an app exposes its widgets to the Linux a11y stack (GTK apps, Qt apps, most native GNOME stuff), tine walks the tree and gives the agent structured data — roles, names, bounding boxes, actions. The agent can say
click ref_17and tine clicks the center of the button with that ref. - Labeled coordinate grid. When AT-SPI2 is sparse or missing (Chrome, Electron, most web content, games), tine overlays a labeled grid on the screenshot and the agent says
click B3. Not fancy, but it works. - OCR text refs. Run OCR on the screenshot (RapidOCR, local, CPU) and get refs like
ref_t3tied to detected text regions. The agent can click by on-screen text —tine click ref_t3clicks the center of the OCR region with that ref. Optional, lazy-loaded, install withpip install tine-cli[ocr].
And it injects input at the kernel level through /dev/uinput, so no Wayland portal ever prompts for consent and no headless agent loop gets interrupted by a dialog.
Quickstart
Tine targets GNOME Shell on Wayland, Linux only. Other environments are out of scope for v1.
1. System prerequisites
Tine uses PyGObject (gi.repository.Atspi) to read the accessibility tree. PyGObject is best installed via your distro's package manager — installing it from pip requires building pycairo and PyGObject from source and pulling in libcairo2-dev, libgirepository-dev, pkg-config, cmake, and a C compiler. Don't do that. Install these instead:
# Arch / Manjaro
sudo pacman -S python python-gobject at-spi2-core python-pip python-virtualenv
# Fedora
sudo dnf install python3 python3-gobject at-spi2-core python3-pip python3-virtualenv
# Debian / Ubuntu
sudo apt install python3 python3-gi gir1.2-atspi-2.0 at-spi2-core python3-pip python3-venv
2. Install tine
Create a venv that inherits system site-packages (so the distro's python3-gi is importable) and install tine-cli into it:
python3 -m venv --system-site-packages ~/.local/share/tine-venv
~/.local/share/tine-venv/bin/pip install tine-cli
# with OCR text refs (adds RapidOCR + onnxruntime, ~200 MB):
# ~/.local/share/tine-venv/bin/pip install "tine-cli[ocr]"
mkdir -p ~/.local/bin
ln -sf ~/.local/share/tine-venv/bin/tine ~/.local/bin/tine
# ensure ~/.local/bin is on your PATH
The PyPI package is tine-cli (the name tine was already taken); the installed command is tine.
Why the --system-site-packages venv: modern Debian, Ubuntu, and Fedora set PEP 668 "externally-managed-environment" on the system Python, so pip install tine-cli fails there. A venv is the correct answer, and --system-site-packages makes it inherit the system python3-gi so tine doesn't have to rebuild PyGObject from source.
Arch / Manjaro users can also do pip install --user tine-cli directly if they prefer — Arch doesn't enforce PEP 668.
3. Give your user uinput access (one-time)
Tine injects input via /dev/uinput. Add yourself to the input group and an ACL so you don't need root:
sudo usermod -aG input $USER
echo 'KERNEL=="uinput", GROUP="input", MODE="0660", OPTIONS+="static_node=uinput"' \
| sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
# log out and back in for group changes to take effect
4. Install the GNOME Shell extension
The extension/ directory in this repo contains a small GNOME Shell extension that exposes screenshots and window enumeration over D-Bus. Tine calls into it instead of going through the Screencast portal.
cp -r extension/ocr-screenshot@local ~/.local/share/gnome-shell/extensions/
# log out / back in (Wayland can't hot-reload extensions), then:
gnome-extensions enable ocr-screenshot@local
5. Try it
tine describe # full desktop context (screenshot + a11y tree)
tine windows # list open windows
tine screenshot --grid # overlay a labeled grid (A1..Z26)
tine click B4 # click center of grid cell B4
tine focus Firefox
tine type "https://news.ycombinator.com"
tine key Return
If all five commands run clean, you're done — hand the CLI to an agent and let it drive.
The commands
| Command | What it does |
|---|---|
tine tree |
Walk the AT-SPI2 accessibility tree. Assigns short refs (ref_1, ref_2, ...) and caches them. |
tine tree --app Firefox |
Scope the walk to a single app. |
tine screenshot |
Full-screen capture via the GNOME Shell extension. |
tine screenshot --annotate |
Overlay Set-of-Mark boxes on known a11y elements. |
tine screenshot --grid |
Overlay a labeled coordinate grid for sparse-tree apps. |
tine screenshot --ocr |
Run OCR, add ref_tN entries for detected text regions. |
tine describe --ocr |
describe plus OCR text refs (needs [ocr] extra). |
tine click ref_3 |
Click the center of a cached a11y ref's bounding box. |
tine click ref_t3 |
Click the center of an OCR text ref. |
tine click B3 |
Click the center of grid cell B3. |
tine click 450,320 |
Click raw pixel coordinates. |
tine target ref_3 |
3x3 sub-grid crosshair for refinement when click misses. |
tine activate ref_3 |
Invoke the AT-SPI2 action directly — no mouse, no coordinates. |
tine type "text" |
Type via EV_KEY events. |
tine key ctrl+c |
Press a key combination. |
tine windows |
Enumerate windows: title, position, size, focus state. |
tine focus Firefox |
Raise and focus a window by title match. |
tine inputd start |
Start the persistent input daemon (8x faster per command). |
tine describe |
Screenshot + tree in one call — the standard "what's on screen?". |
Architecture
┌────────────────────────────────────────────────────────┐
│ Claude Code / Codex / other agent session │
└────────────────────────┬───────────────────────────────┘
│ shell commands
▼
┌────────────────────────────────────────────────────────┐
│ tine CLI │
│ ┌───────────────┐ ┌────────────────┐ ┌──────────┐ │
│ │ ref cache │ │ grid resolver │ │ inputd │ │
│ │ (ref_N→bbox) │ │ (B3→pixels) │ │ (8x fast)│ │
│ └───────┬───────┘ └────────┬───────┘ └─────┬────┘ │
└──────────┼───────────────────┼────────────────┼────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌─────────┐
│ AT-SPI2 │ │ GNOME Shell │ │ /dev/ │
│ D-Bus │ │ extension │ │ uinput │
│ (read) │ │ (screenshot, │ │ (kernel │
│ │ │ windows) │ │ input) │
└──────────┘ └──────────────┘ └─────────┘
- AT-SPI2 gives structured UI data: roles, names, bounding boxes, states, actions. Read-only, standard accessibility API, no special permissions.
- GNOME Shell extension exposes a D-Bus interface for screenshots and window management without hitting the Screencast portal.
- python-evdev / uinput injects mouse (
EV_ABS) and keyboard (EV_KEY) events at the kernel level. No compositor cooperation, no consent prompts.
Example: log into Reddit from a Claude Code session
tine focus Firefox
tine key ctrl+l
tine type "https://old.reddit.com/login"
tine key Return
tine describe # agent reads the login page
# → agent sees ref_12 "username field", ref_13 "password field", ref_14 "log in button"
tine click ref_12
tine type "my_username"
tine click ref_13
tine type "$REDDIT_PASSWORD"
tine activate ref_14 # bypass the click — invoke the a11y action directly
Each step is a single shell command. The agent reads describe, decides, runs one command, checks describe again. No portals, no consent dialogs, no coordinate-by-screenshot guesswork.
How tine compares
| Tool | Input method | Wayland? | Structured reads | Portal dialogs |
|---|---|---|---|---|
| tine | uinput (kernel) | ✅ | AT-SPI2 + grid | ❌ none |
| xdotool | X11 | ❌ | X11 props | n/a |
| ydotool | uinput (kernel) | ✅ | ❌ | ❌ |
| pyautogui | X11 / mouse events | partial | ❌ | n/a |
| Playwright-desktop | browser only | n/a | browser DOM | n/a |
| Anthropic computer use | screenshot + coordinates | ✅ | vision only | portal consent |
Tine is the only one that combines structured reads with portal-free input on Wayland.
Known limitations (v0.1)
- GNOME Shell / Wayland only. Other compositors (KDE, Hyprland, Sway) should work for the input side, but the screenshot/focus path depends on the bundled GNOME Shell extension. PRs welcome.
- Linux only. No macOS or Windows plans.
- Requires uinput access. One-time udev setup. No way around this without portals.
- AT-SPI2 is sparse in some apps. Chrome, some Electron apps, most games. Use
tine screenshot --gridas the fallback.
Contributing
Tests:
pip install -e ".[dev]"
pytest
Most tests run without a display — the input and screenshot layers are mocked. The AT-SPI2 walker tests use fixtures from research/fixtures/.
Issues and PRs welcome.
License
Apache License 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tine_cli-0.1.0.tar.gz.
File metadata
- Download URL: tine_cli-0.1.0.tar.gz
- Upload date:
- Size: 61.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7720930611c7bd7cf23847932101192bfb97546a1af3640113b80211da3f332c
|
|
| MD5 |
1c7b7db37aebfbb77525afc857817928
|
|
| BLAKE2b-256 |
9840bfbbc0515200d9dec31249bc130cdde585eb8e8cdc314e14740aac7734cf
|
File details
Details for the file tine_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tine_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 46.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1237d9ffaf9b0033285dde0e7fcffe31d0af59642c1a72634bdf57188c046310
|
|
| MD5 |
b72364f8250864edeeb8b44726f50215
|
|
| BLAKE2b-256 |
285c9390903a5c7a44cec0741e9680146c0bec2efd90377dff6f1fd6e631a4a7
|