Skip to main content

Cross-platform desktop automation API for capturing windows and performing GUI actions

Project description

desktop-api

Cross-platform Python library for desktop automation: capture screens/windows and control mouse & keyboard. Works on macOS, Windows, and Linux.

Built on pyautogui, pygetwindow, and mss.


Table of Contents


Installation

# Clone and install
git clone https://github.com/your-repo/desktop-api.git
cd desktop-api

python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

pip install -e .

Platform-Specific Requirements

Platform Additional Setup
macOS Grant Screen Recording and Input Monitoring permissions when prompted. Install PyObjC: pip install pyobjc-core pyobjc-framework-Quartz pyobjc-framework-Cocoa
Windows No extra dependencies. Run as Administrator if automating elevated apps.
Linux X11 only (Wayland not supported). Install: sudo apt install scrot python3-xlib

Quick Start

from desktop_api import DesktopController, WindowNotFoundError

# Create a controller
controller = DesktopController()

# Find and capture a window
try:
    window = controller.find_window("Safari")
    screenshot = controller.capture_window(window)
    screenshot.save("safari.png")
except WindowNotFoundError:
    print("Safari is not open!")

Examples

Creating a Controller

The DesktopController is your main entry point for all automation tasks.

from desktop_api import DesktopController

# Basic usage
controller = DesktopController()

# With safety options
controller = DesktopController(
    fail_safe=True,   # Move mouse to top-left corner to abort (default: True)
    pause=0.1         # Wait 0.1s between actions (default: 0.0)
)

Tip: Use pause=0.1 to give GUIs time to respond between actions.


Finding Windows

List All Open Windows

from desktop_api import DesktopController

controller = DesktopController()

# Get all visible windows
windows = controller.list_windows()

for win in windows:
    print(f"{win.title}")
    print(f"  Position: ({win.left}, {win.top})")
    print(f"  Size: {win.width}x{win.height}")
    print(f"  Active: {win.is_active}")
    print()

Find a Specific Window

from desktop_api import DesktopController, WindowNotFoundError

controller = DesktopController()

# Find by partial title match (case-insensitive)
try:
    window = controller.find_window("Chrome")
    print(f"Found: {window.title}")
except WindowNotFoundError:
    print("Window not found!")

# Find with exact title match
window = controller.find_window("Google Chrome", exact=True)

# Find without auto-activating the window
window = controller.find_window("Chrome", activate=False)

# Case-sensitive search
window = controller.find_window("Chrome", case_sensitive=True)

Activate and Refresh Windows

# Bring window to foreground
controller.activate_window(window)

# Refresh window geometry (after resize/move)
window = controller.refresh_window(window)
print(f"New position: ({window.left}, {window.top})")

Taking Screenshots

Capture Entire Screen

from desktop_api import DesktopController

controller = DesktopController()

# Capture primary monitor (full virtual screen)
screenshot = controller.capture_screen()
screenshot.save("fullscreen.png")

# Capture specific monitor (0 = all, 1 = first, 2 = second, etc.)
monitor1 = controller.capture_screen(monitor=1)
monitor1.save("monitor1.png")

Capture a Window

from desktop_api import DesktopController, WindowNotFoundError

controller = DesktopController()

try:
    window = controller.find_window("Notes")
    
    # Basic window capture
    screenshot = controller.capture_window(window)
    screenshot.save("notes.png")
    
    # Capture with padding around the window
    screenshot_padded = controller.capture_window(window, padding=20)
    screenshot_padded.save("notes_with_border.png")
    
    # Activate window before capturing (ensures it's visible)
    screenshot = controller.capture_window(window, activate=True)
    
except WindowNotFoundError:
    print("Notes app is not open!")

Capture a Region

from desktop_api import DesktopController

controller = DesktopController()

# Capture specific area: (left, top, width, height)
region = controller.capture_region((100, 100, 800, 600))
region.save("region.png")

# Using dict format
region = controller.capture_region({
    "left": 100,
    "top": 100,
    "width": 800,
    "height": 600
})

Mouse Operations

All mouse methods accept relative_to parameter for window-relative coordinates.

Click

from desktop_api import DesktopController

controller = DesktopController()
window = controller.find_window("Notes")

# Click at absolute screen position
controller.click(500, 300)

# Click relative to window (100px right, 80px down from top-left)
controller.click(100, 80, relative_to=window)

# Right click
controller.click(100, 80, button="right", relative_to=window)

# Triple click (select paragraph)
controller.click(100, 80, clicks=3, relative_to=window)

# Double click
controller.double_click(100, 80, relative_to=window)

Move Mouse

# Move instantly
controller.move_mouse(500, 300)

# Move with animation
controller.move_mouse(500, 300, duration=0.5)

# Move relative to window
controller.move_mouse(100, 100, relative_to=window)

Drag

# Drag from point A to point B
controller.drag(
    start_x=100, start_y=100,
    end_x=300, end_y=200,
    duration=0.3,
    relative_to=window
)

# Draw a line in a drawing app
controller.drag(50, 50, 200, 200, relative_to=window)

Low-Level Mouse Control

# Manual press and release (for custom drag operations)
controller.mouse_down(100, 100, relative_to=window)
controller.move_mouse(200, 200, relative_to=window, duration=0.5)
controller.mouse_up(200, 200, relative_to=window)

Scroll

# Scroll down (negative = down, positive = up)
controller.scroll(-3)

# Scroll up
controller.scroll(5)

# Scroll at specific position
controller.scroll(-3, x=100, y=100, relative_to=window)

Keyboard Operations

Type Text

from desktop_api import DesktopController

controller = DesktopController()

# Type text instantly
controller.type_text("Hello, World!")

# Type with delay between characters
controller.type_text("Slow typing...", interval=0.05)

# Type with newline
controller.type_text("Line 1\nLine 2\n")

Send Hotkeys

# Save file (Cmd+S on Mac, Ctrl+S on Windows/Linux)
controller.send_hotkey("command", "s")  # macOS
controller.send_hotkey("ctrl", "s")     # Windows/Linux

# Select all
controller.send_hotkey("command", "a")

# Copy
controller.send_hotkey("command", "c")

# Paste
controller.send_hotkey("command", "v")

# Undo
controller.send_hotkey("command", "z")

# Multiple modifiers
controller.send_hotkey("command", "shift", "s")  # Save As

API Reference

DesktopController Methods

Method Description
list_windows() List all visible windows
find_window(query, exact=False, case_sensitive=False, activate=True) Find window by title
activate_window(target) Bring window to foreground
refresh_window(target) Get updated window geometry
capture_screen(monitor=0) Screenshot of monitor
capture_window(target, activate=False, padding=0) Screenshot of window
capture_region(region) Screenshot of area
click(x, y, button="left", clicks=1, relative_to=None) Mouse click
double_click(x, y, relative_to=None) Double click
move_mouse(x, y, duration=0.0, relative_to=None) Move cursor
drag(start_x, start_y, end_x, end_y, duration=0.2, relative_to=None) Drag operation
mouse_down(x, y, button="left", relative_to=None) Press mouse button
mouse_up(x, y, button="left", relative_to=None) Release mouse button
scroll(clicks, x=None, y=None, relative_to=None) Scroll wheel
type_text(text, interval=0.0) Type string
send_hotkey(*keys, interval=0.0) Send key combination

WindowHandle Properties

Property Description
title Window title
left, top Position (top-left corner)
width, height Dimensions
right, bottom Computed edges
is_active Whether window is focused
handle Native window handle
pid Process ID

Example Scripts

examples/demo.py

Basic window capture and click demo:

python examples/demo.py --window "Safari" --output screenshot.png

examples/clicker.py

Hotkey-based auto-clicker (requires pip install pynput):

# Hold Shift to click at 15 CPS
python examples/clicker.py --cps 15 --hotkey shift

# Toggle mode with Space key
python examples/clicker.py --cps 10 --toggle-hotkey space

examples/dummy_agent_loop.py

Template for AI agent automation loops:

python examples/dummy_agent_loop.py --window "Notes" --iterations 5

Complete Example: Automate Notes App

"""Full example: Open Notes, type text, save file."""
from desktop_api import DesktopController, WindowNotFoundError
import time

controller = DesktopController(fail_safe=True, pause=0.1)

try:
    # Find the Notes window
    notes = controller.find_window("Notes")
    print(f"Found: {notes.title} at ({notes.left}, {notes.top})")
    
    # Take a "before" screenshot
    before = controller.capture_window(notes)
    before.save("before.png")
    
    # Click to focus the text area (adjust coordinates for your app)
    controller.click(200, 200, relative_to=notes)
    time.sleep(0.2)
    
    # Type some text
    controller.type_text("Hello from desktop-api!\n")
    controller.type_text("This is automated input.\n")
    
    # Save with Cmd+S (or Ctrl+S on Windows/Linux)
    controller.send_hotkey("command", "s")
    time.sleep(0.5)
    
    # Take an "after" screenshot
    after = controller.capture_window(notes)
    after.save("after.png")
    
    print("Done! Check before.png and after.png")
    
except WindowNotFoundError:
    print("Please open the Notes app first!")

Safety Tips

  1. Keep fail_safe=True – Moving mouse to top-left corner aborts automation
  2. Use pause parameter – Gives GUIs time to respond between actions
  3. Test with non-destructive actions first – Verify coordinates before automating
  4. Add confirmation logic – For destructive operations, add explicit checks
  5. Use activate=False – When you only need window metadata, not focus

Troubleshooting

Issue Solution
WindowNotFoundError Check if window is open and title matches
Permission denied (macOS) Grant Screen Recording & Input Monitoring in System Preferences
Wayland not working (Linux) Use X11 or XWayland session
Clicks going to wrong position Use refresh_window() to update geometry after window moves
Typing not working Ensure target window has keyboard focus

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desktop_api-0.1.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

desktop_api-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file desktop_api-0.1.0.tar.gz.

File metadata

  • Download URL: desktop_api-0.1.0.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for desktop_api-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f36b75dfead50c065068b830480652ca1df2df482980d962bf1640cf43cac938
MD5 ce54322f6909aaa5a547e550e450e912
BLAKE2b-256 f440af0996164545c8529a3b516243566f3ff90b4d60716f87a3bcb33b3b5d79

See more details on using hashes here.

File details

Details for the file desktop_api-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: desktop_api-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for desktop_api-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5b2d8fb7a6c7aadce3519dcba8724ec13c10bea450ff134db049782a9f60df7
MD5 7d36180ee033b506fd11773d730a083d
BLAKE2b-256 bbfc9ef99e102380c2572260752c0ab5c1e2f4630a0f0fb2802dc96354399509

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page