Skip to main content

PyVisionAuto: Cross-platform desktop automation toolkit with visual image matching, mouse/keyboard control, and screen recording

Project description

PyVisionAuto

PyPI version Python Platform

PyVisionAuto is an end-to-end desktop automation toolkit. It is centered on visual image matching and also includes screen recording, mouse automation, and keyboard automation capabilities.

Scope

  • Linux (X11 session) and Windows
  • Real physical display required

Install

pip install pyvisionauto

System dependencies

Linux

  • python3-tk — Required for border overlay highlight
  • xdotool — Preferred for window activation
  • wmctrl — Fallback for window activation
  • ffmpeg — Required for screen recording; install via sudo apt install ffmpeg

Windows

  • tkinter — Bundled with most Python installations
  • ffmpeg — Required for screen recording; download from ffmpeg.org, extract archive, and add the bin folder to system PATH

Verify ffmpeg installation

# Check if ffmpeg is installed and accessible
ffmpeg -version

Note: Screen recording (via Recorder API) requires ffmpeg. On Linux, it uses x11grab codec; on Windows, it uses gdigrab codec. Both are built into ffmpeg by default.

Quick start

Basic usage: Find and click

from pyvisionauto import Screen

screen = Screen()
# Wait for image to appear on screen, highlight it, then click
screen.wait("login_button.png", timeout=10).highlight().click()

Advanced example: Record automation with screen capture

This example demonstrates screen recording combined with visual automation:

from pyvisionauto import Screen, Recorder
from pathlib import Path

# Initialize screen automation and recorder
screen = Screen()
recorder = Recorder()

# Ensure ffmpeg is installed for recording:
# - Linux: sudo apt install ffmpeg
# - Windows: Download from ffmpeg.org and add to PATH

# Start recording the entire screen to a video file
# This creates a timestamped video file in the current directory
video_path = Path("automation_demo.mp4")
recorder.start_recording(output_path=video_path)

try:
    # Activate target application window
    screen.activate_window("Calculator")
    
    # Wait for "1" button to appear (max 10 seconds) and highlight it
    screen.wait("button_1.png", timeout=10).highlight().click()
    
    # Click "+" button with custom highlight duration (500ms)
    screen.click("button_plus.png", timeout=5)
    
    # Type numbers using keyboard automation
    screen.type_text("5")
    
    # Wait for and click "=" button
    screen.wait("button_equals.png", timeout=5).highlight().click()
    
    # Check that result appears (with 3 second timeout)
    screen.wait("result_7.png", timeout=3).highlight()
    
finally:
    # Stop recording (important: cleanup resources)
    recorder.stop_recording()
    print(f"✓ Automation recorded to {video_path}")

Activate a window before matching

screen.activate_window("Calculator")
screen.click("button.png")

Runtime screenshot

Highlighted match region during runtime:

PyVisionAuto runtime screenshot with highlighted region

Platform differences

Feature Linux Windows
Screen capture & template matching Supported Supported
Mouse / keyboard automation Supported Supported
Highlight overlay Supported Supported
Window activation xdotool / wmctrl pyautogui (pygetwindow)
Screen recording ffmpeg + x11grab ffmpeg + gdigrab

Screen recording requires ffmpeg installed and added to system PATH. Linux uses x11grab, Windows uses gdigrab.

Notes

  • Wayland-only and headless environments are not currently supported.
  • On Windows with high-DPI scaling, coordinate accuracy may be affected.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvisionauto-0.1.3.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyvisionauto-0.1.3-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file pyvisionauto-0.1.3.tar.gz.

File metadata

  • Download URL: pyvisionauto-0.1.3.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for pyvisionauto-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e8539258db170e6a7385ad4062f75eb4dce227f5b2ff3f81b31762b8c5eeeb20
MD5 520a4f3cf138fdff6f3667f80b72ab97
BLAKE2b-256 d0e38e74646e55394053b1f2e8c1036d1170111df5197b4b214d127cfaa45b19

See more details on using hashes here.

File details

Details for the file pyvisionauto-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pyvisionauto-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for pyvisionauto-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 007b466776b3c3e03314c6aa74716261fd559a1437058fa282166064e64f39fc
MD5 b1fd2c0a0b3569039800ee6e6e0fccd5
BLAKE2b-256 eb6d0ef417ef533bf1b296c49909fd0fe43a3c545397d021da49811c557aefc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page