PyVisionAuto: Cross-platform desktop automation toolkit with visual image matching, mouse/keyboard control, and screen recording

These details have not been verified by PyPI

Project description

PyVisionAuto

Python Platform

PyVisionAuto is an end-to-end desktop automation toolkit. It is centered on visual image matching and also includes screen recording, mouse automation, and keyboard automation capabilities.

Scope

Linux (X11 session) and Windows
Real physical display required

Install

pip install pyvisionauto

System dependencies

Linux

python3-tk — Required for border overlay highlight
xdotool — Preferred for window activation
wmctrl — Fallback for window activation
ffmpeg — Required for screen recording; install via sudo apt install ffmpeg

Windows

tkinter — Bundled with most Python installations
ffmpeg — Required for screen recording; download from ffmpeg.org, extract archive, and add the bin folder to system PATH

Verify ffmpeg installation

# Check if ffmpeg is installed and accessible
ffmpeg -version

Note: Screen recording (via Recorder API) requires ffmpeg. On Linux, it uses x11grab codec; on Windows, it uses gdigrab codec. Both are built into ffmpeg by default.

Quick start

Basic usage: Find and click

from pyvisionauto import Screen

screen = Screen()
# Wait for image to appear on screen, highlight it, then click
screen.wait("login_button.png", timeout=10).highlight().click()

Advanced example: Record automation with screen capture

This example demonstrates screen recording combined with visual automation:

from pyvisionauto import Screen, Recorder
from pathlib import Path

screen = Screen()
recorder = Recorder()

recorder.start_recording(output_path=Path("automation_demo.mp4"))
try:
    screen.activate_window("Calculator")
    screen.wait("button_1.png", timeout=10).highlight().click()
    screen.click("button_plus.png", timeout=5)
    screen.type_text("5")
    screen.wait("button_equals.png", timeout=5).highlight().click()
    screen.wait("result_7.png", timeout=3).highlight()
finally:
    recorder.stop_recording()

Activate a window before matching

screen.activate_window("Calculator")
screen.click("button.png")

Runtime screenshot

Highlighted match region during runtime:

PyVisionAuto runtime screenshot with highlighted region

Platform differences

Feature	Linux	Windows
Screen capture & template matching	Supported	Supported
Mouse / keyboard automation	Supported	Supported
Highlight overlay	Supported	Supported
Window activation	xdotool / wmctrl	pyautogui (pygetwindow)
Screen recording	ffmpeg + x11grab	ffmpeg + gdigrab

Screen recording requires ffmpeg installed and added to system PATH. Linux uses x11grab, Windows uses gdigrab.

Window focus on Linux (X11)

pyautogui uses XTest synthetic events to move the mouse and click. On X11, synthetic pointer events do not trigger focus changes — the window manager only reassigns focus in response to real hardware events. This means:

click() moves the cursor to the correct coordinates and clicks, but the keyboard focus stays wherever it was before.
Any subsequent keyboard action (press(), type_text(), hotkeys) is delivered to whichever window currently has focus — which may not be the window you just clicked.

Rule of thumb: always call activate_window() before any keyboard action, targeting the exact window that should receive it.

Use xdotool to find the precise window name while the application is running:

xdotool search --name "" 2>/dev/null | while read id; do
    printf "ID=%-12s %s\n" "$id" "$(xdotool getwindowname "$id" 2>/dev/null)"
done

Pick the shortest substring that uniquely identifies the target window and use it in activate_window().

Main window vs. dialogs

When a modal dialog is open, activate the dialog directly — do not activate the main window and rely on the WM to forward focus:

from pyvisionauto import Screen
from rod_automation.automation.desktop_utils import bring_window_to_front

screen = Screen()

# --- Clicking a dialog image ---
# Activate the dialog BEFORE sending keyboard input to it.
# Without this, ESC/Enter goes to whichever window had focus before.
screen.wait("open_project_dialog.png", timeout=30).highlight().click()
bring_window_to_front("Open Project")   # activate the dialog, not the main window
screen.input.press("esc")               # now ESC is reliably delivered to the dialog

# --- Clicking main-window controls ---
bring_window_to_front("My App 2026")   # activate the main window
screen.wait("toolbar_button.png", timeout=10).click()

Why not just activate the main window? On GNOME/Mutter, activating the main window does propagate focus to a modal child dialog — but this is WM-specific behaviour. Activating the dialog directly is explicit, portable, and not dependent on WM modal-focus rules.

Notes

Wayland-only and headless environments are not currently supported.
On Windows with high-DPI scaling, coordinate accuracy may be affected.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

May 9, 2026

0.1.5

May 8, 2026

This version

0.1.4

May 7, 2026

0.1.3

May 7, 2026

0.1.2

May 7, 2026

0.1.1

May 7, 2026

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvisionauto-0.1.4.tar.gz (28.4 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyvisionauto-0.1.4-py3-none-any.whl (27.7 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file pyvisionauto-0.1.4.tar.gz.

File metadata

Download URL: pyvisionauto-0.1.4.tar.gz
Upload date: May 7, 2026
Size: 28.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for pyvisionauto-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`88aaacdd5ec25bcd0702aa98974527f52a43a96660dff237a2ace199f48cd33b`
MD5	`9365ec80e72bfddcb4b76c36f9a5d14d`
BLAKE2b-256	`2aa3bcd322158bc0684357e8ad9c0a33ce0f659131bb6492e2546b4acb4d0215`

See more details on using hashes here.

File details

Details for the file pyvisionauto-0.1.4-py3-none-any.whl.

File metadata

Download URL: pyvisionauto-0.1.4-py3-none-any.whl
Upload date: May 7, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for pyvisionauto-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a591192f2f6c9e6f361775bed11369c5ff1292113af2e1c9c1a4cc8d611c7fbe`
MD5	`f5b15a641e003daa23c270f292aa2526`
BLAKE2b-256	`7da43649bf5c2de873736975b84536e06243e3e6f9d8b5d61256ebff77de704d`

See more details on using hashes here.

pyvisionauto 0.1.4

Navigation

Verified details

Project links

Maintainers

Unverified details

Meta

Classifiers

Project description

PyVisionAuto

Scope

Install

System dependencies

Linux

Windows

Verify ffmpeg installation

Quick start

Basic usage: Find and click

Advanced example: Record automation with screen capture

Activate a window before matching

Runtime screenshot

Platform differences

Window focus on Linux (X11)

Main window vs. dialogs

Notes

Project details

Verified details

Project links

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes