AI agent skill for desktop automation using PyAutoGUI
Project description
Desktop Control Skill
🤖 AI Agent Skill for desktop automation using PyAutoGUI.
Control mouse, keyboard, and screen programmatically through a simple CLI interface.
⚡ Quick Install
For AI Agents
Add this skill to your AI coding agent with a single command:
npx skills add patrickporto/desktop-agent
For Users
Install the CLI with pipx (recommended):
pipx install desktop-agent
Or run without installing using uvx:
uvx desktop-agent --help
Or using pip:
pip install desktop-agent
🎯 For AI Agents
This project is packaged as an AI Agent Skill. To use it:
- Read the skill documentation: SKILL.md
- Install the CLI: Use
pip install desktop-agentorpipx install desktop-agent - Invoke commands: Use
desktop-agent <category> <command>
Quick Reference for Agents:
- All commands documented in SKILL.md
- Practical examples in examples/automation_examples.md
- Help system:
desktop-agent --help - All commands return JSON output by default
📦 Installation
Using pipx (Recommended)
pipx install desktop-agent
Using pip
pip install desktop-agent
Run without installing (using uvx)
uvx desktop-agent
Usage
The CLI is organized into command categories:
🖱️ Mouse (mouse)
# Move mouse to coordinates
desktop-agent mouse move 100 200
# Move with duration (animation)
desktop-agent mouse move 100 200 --duration 1.0
# Click at current position
desktop-agent mouse click
# Click at specific coordinates
desktop-agent mouse click 500 500
# Right click
desktop-agent mouse right-click
# Double click
desktop-agent mouse double-click 300 400
# Drag to coordinates
desktop-agent mouse drag 200 300
# Scroll (positive = up, negative = down)
desktop-agent mouse scroll 5
desktop-agent mouse scroll -3
# Get current mouse position
desktop-agent mouse position
⌨️ Keyboard (keyboard)
# Write text
desktop-agent keyboard write "Hello World"
# Write with interval between keys
desktop-agent keyboard write "Slow typing" --interval 0.1
# Press a key
desktop-agent keyboard press enter
# Press multiple times
desktop-agent keyboard press a --presses 5
# Execute keyboard shortcut
desktop-agent keyboard hotkey "ctrl,c"
desktop-agent keyboard hotkey "ctrl,shift,esc"
# Hold/release key
desktop-agent keyboard keydown shift
desktop-agent keyboard keyup shift
🖼️ Screen (screen)
# Capture screenshot (full screen)
desktop-agent screen screenshot my_screen.png
# Take screenshot of active window
desktop-agent screen screenshot active_window.png --active
# Take screenshot of specific window
desktop-agent screen screenshot notepad.png --window "Notepad"
# Screenshot of specific region (x,y,width,height)
desktop-agent screen screenshot region.png --region "100,100,500,400"
# Locate image within active window
desktop-agent screen locate button.png --active
# Locate center of image on screen
desktop-agent screen locate-center button.png --confidence 0.8
# Find text coordinates within active window
desktop-agent screen locate-text-coordinates "OK" --active
# Find text in specific image
desktop-agent screen locate-text-coordinates "Confirm" --image screenshot.png
# Case-sensitive search
desktop-agent screen locate-text-coordinates "Login" --case-sensitive
# Read all text from screen
desktop-agent screen read-all-text
# Read text from image
desktop-agent screen read-all-text --image capture.png
# Specify languages for OCR (default: pt,en)
desktop-agent screen locate-text-coordinates "Button" --lang "en"
💬 Messages (message)
# Show alert
desktop-agent message alert "Hello!"
# Confirmation
desktop-agent message confirm "Are you sure?"
# Input prompt
desktop-agent message prompt "Enter your name:"
# Password
desktop-agent message password "Enter your password:"
📱 Applications (app)
# Open an application (cross-platform)
desktop-agent app open notepad
desktop-agent app open "Google Chrome"
# Open with arguments
desktop-agent app open chrome --arg "https://google.com"
# Focus on a window by title
desktop-agent app focus "Untitled - Notepad"
# List all visible windows
desktop-agent app list
Automation Examples
Open Notepad and write
desktop-agent app open notepad
desktop-agent app focus notepad
desktop-agent keyboard write "Hello from Desktop Skill!"
Capture screenshot and analyze
desktop-agent screen screenshot full_screen.png
desktop-agent screen pixel 500 500
Available Commands
Run desktop-agent --help to see all commands:
desktop-agent --help
desktop-agent mouse --help
desktop-agent keyboard --help
desktop-agent screen --help
desktop-agent message --help
Project Structure
desktop-skill/
├── desktop_agent/ # Main package
│ ├── __init__.py
│ ├── commands/ # Command modules
│ │ ├── __init__.py
│ │ ├── mouse.py # Mouse commands
│ │ ├── keyboard.py # Keyboard commands
│ │ ├── screen.py # Screen/screenshot/OCR commands
│ │ └── message.py # Message boxes
├── pyproject.toml # Project configuration
└── README.md # This documentation
Technologies
- PyAutoGUI: GUI automation
- EasyOCR: Optical character recognition
- Typer: Modern CLI framework
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file desktop_agent-1.2.1.tar.gz.
File metadata
- Download URL: desktop_agent-1.2.1.tar.gz
- Upload date:
- Size: 157.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc5c0bb247ffda4d6e58aab787dd2a73b0a424be772a83199f42965f64ea2101
|
|
| MD5 |
4f692b160d3728d36fa5c21269fadba6
|
|
| BLAKE2b-256 |
03e74b751518cfc5dd6d0d45158d02c583e0d3c6277945a283ce0e673290bc16
|
Provenance
The following attestation bundles were made for desktop_agent-1.2.1.tar.gz:
Publisher:
release.yml on patrickporto/desktop-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
desktop_agent-1.2.1.tar.gz -
Subject digest:
cc5c0bb247ffda4d6e58aab787dd2a73b0a424be772a83199f42965f64ea2101 - Sigstore transparency entry: 884799423
- Sigstore integration time:
-
Permalink:
patrickporto/desktop-agent@11f602f37a6ceae097d726ed00e9f0bcdcb79708 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/patrickporto
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@11f602f37a6ceae097d726ed00e9f0bcdcb79708 -
Trigger Event:
push
-
Statement type:
File details
Details for the file desktop_agent-1.2.1-py3-none-any.whl.
File metadata
- Download URL: desktop_agent-1.2.1-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be0f88319d722361e4ace5d3ef579890e5490de078f3c5a65196e65878ab2902
|
|
| MD5 |
c46be9305a699980eba2d61ee2ec235d
|
|
| BLAKE2b-256 |
de6ac4be4a25caa4fe49b14b2c11bdea24dfe1ae2921b73ef5a25e46e7ad2822
|
Provenance
The following attestation bundles were made for desktop_agent-1.2.1-py3-none-any.whl:
Publisher:
release.yml on patrickporto/desktop-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
desktop_agent-1.2.1-py3-none-any.whl -
Subject digest:
be0f88319d722361e4ace5d3ef579890e5490de078f3c5a65196e65878ab2902 - Sigstore transparency entry: 884799465
- Sigstore integration time:
-
Permalink:
patrickporto/desktop-agent@11f602f37a6ceae097d726ed00e9f0bcdcb79708 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/patrickporto
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@11f602f37a6ceae097d726ed00e9f0bcdcb79708 -
Trigger Event:
push
-
Statement type: