Skip to main content

AI agent skill for desktop automation using PyAutoGUI

Project description

Desktop Control Skill

🤖 AI Agent Skill for desktop automation using PyAutoGUI.

Control mouse, keyboard, and screen programmatically through a simple CLI interface.

⚡ Quick Install

For AI Agents

Add this skill to your AI coding agent with a single command:

npx skills add patrickporto/desktop-agent

For Users

Install the CLI with pipx (recommended):

pipx install desktop-agent

Or run without installing using uvx:

uvx desktop-agent --help

Or using pip:

pip install desktop-agent

🎯 For AI Agents

This project is packaged as an AI Agent Skill. To use it:

  1. Read the skill documentation: SKILL.md
  2. Install the CLI: Use pip install desktop-agent or pipx install desktop-agent
  3. Invoke commands: Use desktop-agent <category> <command>

Quick Reference for Agents:


📦 Installation

Using pipx (Recommended)

pipx install desktop-agent

Using pip

pip install desktop-agent

Run without installing (using uvx)

uvx desktop-agent

Usage

The CLI is organized into command categories:

🖱️ Mouse (mouse)

# Move mouse to coordinates
desktop-agent mouse move 100 200

# Move with duration (animation)
desktop-agent mouse move 100 200 --duration 1.0

# Click at current position
desktop-agent mouse click

# Click at specific coordinates
desktop-agent mouse click 500 500

# Right click
desktop-agent mouse right-click

# Double click
desktop-agent mouse double-click 300 400

# Drag to coordinates
desktop-agent mouse drag 200 300

# Scroll (positive = up, negative = down)
desktop-agent mouse scroll 5
desktop-agent mouse scroll -3

# Get current mouse position
desktop-agent mouse position

⌨️ Keyboard (keyboard)

# Write text
desktop-agent keyboard write "Hello World"

# Write with interval between keys
desktop-agent keyboard write "Slow typing" --interval 0.1

# Press a key
desktop-agent keyboard press enter

# Press multiple times
desktop-agent keyboard press a --presses 5

# Execute keyboard shortcut
desktop-agent keyboard hotkey "ctrl,c"
desktop-agent keyboard hotkey "ctrl,shift,esc"

# Hold/release key
desktop-agent keyboard keydown shift
desktop-agent keyboard keyup shift

🖼️ Screen (screen)

# Capture screenshot
desktop-agent screen screenshot my_screen.png

# Screenshot of specific region (x,y,width,height)
desktop-agent screen screenshot region.png --region "100,100,500,400"

# Locate image on screen
desktop-agent screen locate image.png

# Locate center of image
desktop-agent screen locate-center button.png --confidence 0.8

# Get pixel color
desktop-agent screen pixel 100 200

# Get screen size
desktop-agent screen size

# Check if coordinates are on screen
desktop-agent screen on-screen 5000 5000

# Find text coordinates on screen using OCR (partial search)
desktop-agent screen locate-text-coordinates "OK"

# Find text in specific image
desktop-agent screen locate-text-coordinates "Confirm" --image screenshot.png

# Case-sensitive search
desktop-agent screen locate-text-coordinates "Login" --case-sensitive

# Read all text from screen
desktop-agent screen read-all-text

# Read text from image
desktop-agent screen read-all-text --image capture.png

# Specify languages for OCR (default: pt,en)
desktop-agent screen locate-text-coordinates "Button" --lang "en"

💬 Messages (message)

# Show alert
desktop-agent message alert "Hello!"

# Confirmation
desktop-agent message confirm "Are you sure?"

# Input prompt
desktop-agent message prompt "Enter your name:"

# Password
desktop-agent message password "Enter your password:"

📱 Applications (app)

# Open an application (cross-platform)
desktop-agent app open notepad
desktop-agent app open "Google Chrome"

# Open with arguments
desktop-agent app open chrome --arg "https://google.com"

# Focus on a window by title
desktop-agent app focus "Untitled - Notepad"

# List all visible windows
desktop-agent app list

Automation Examples

Open Notepad and write

desktop-agent app open notepad
desktop-agent app focus notepad
desktop-agent keyboard write "Hello from Desktop Skill!"

Capture screenshot and analyze

desktop-agent screen screenshot full_screen.png
desktop-agent screen pixel 500 500

Available Commands

Run desktop-agent --help to see all commands:

desktop-agent --help
desktop-agent mouse --help
desktop-agent keyboard --help
desktop-agent screen --help
desktop-agent message --help

Project Structure

desktop-skill/
├── desktop_agent/       # Main package
│   ├── __init__.py
│   ├── commands/        # Command modules
│   │   ├── __init__.py
│   │   ├── mouse.py    # Mouse commands
│   │   ├── keyboard.py # Keyboard commands
│   │   ├── screen.py   # Screen/screenshot/OCR commands
│   │   └── message.py  # Message boxes
├── pyproject.toml      # Project configuration
└── README.md           # This documentation

Technologies

  • PyAutoGUI: GUI automation
  • EasyOCR: Optical character recognition
  • Typer: Modern CLI framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desktop_agent-1.1.0.tar.gz (87.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

desktop_agent-1.1.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file desktop_agent-1.1.0.tar.gz.

File metadata

  • Download URL: desktop_agent-1.1.0.tar.gz
  • Upload date:
  • Size: 87.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for desktop_agent-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f167b7faa6aaa10c8e91f74321b6599d999b9031a817b387710ddefa54772dbb
MD5 f4a7d12939dd1d4c8762d0915c3c08d3
BLAKE2b-256 457953e738d5e6bd32a2c7cce068b79a2f0fe3f9c0bef2b662971d98e8c7f5df

See more details on using hashes here.

Provenance

The following attestation bundles were made for desktop_agent-1.1.0.tar.gz:

Publisher: release.yml on patrickporto/desktop-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file desktop_agent-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: desktop_agent-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for desktop_agent-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a0b213cd2ce5c9b8f7d51aa0b82f37171c00b6fa1284d56bfd099aa29b47719
MD5 bb7e4f7877e5be4b322a3e086b94224a
BLAKE2b-256 a3441cab0b46988fb57f8c13504e7972299f8a7e19aa515c9a255844d204219e

See more details on using hashes here.

Provenance

The following attestation bundles were made for desktop_agent-1.1.0-py3-none-any.whl:

Publisher: release.yml on patrickporto/desktop-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page