A desktop interaction tool that allows AI agents like Claude Code or OpenAI Codex to control web browsers and interact with desktop applications through natural language commands.
Project description
Skipper
A desktop interaction tool that allows AI agents like Claude Code or OpenAI Codex to control web browsers and interact with desktop applications through natural language commands.
Overview
Skipper enables AI agents to:
- View and navigate the web using your own browser
- Navigate to URLs
- Execute mouse clicks, keyboard input, and scrolling actions
- [SOON] Interact with any desktop application through natural language prompts
The Vision
Skipper is a command line tool that gives AI agents the ability to interact with your browser/desktop. Unlike all-in-one computer use tools, Skipper takes the unix philosophy of "do one thing and do it well". Specifically, it is designed to be the "hands" of the AI agent, instead of the "brain".
Our architecture is designed to be privacy-preserving in the future. If the tool ends up being useful, we have designed the architecture to be possible to run locally only. The only sensitive information that would go to the cloud would be in text to the LLM agent, which could be censored or modified as necessary for privacy.
Installation
Prerequisites
- Python 3.9 or higher
- Chrome/Chromium browser with remote debugging enabled
- Gemini API key (for AI-powered interactions)
- Either
- A computer capable of running OmniParser
- An API key for Skipper to run this stage remotely
Install Skipper
pip install skipper-tool
Setup Chrome Remote Debugging
- Start Chrome with remote debugging enabled:
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Linux
google-chrome --remote-debugging-port=9222
# Windows
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
- Keep Chrome running in the background while using skipper_tool.
Initial Configuration
# Create a configuration file
skipper init --config
# This will prompt for your Gemini API key and create ~/.skipperrc
Usage
Skipper provides three main commands for AI agents:
1. View Window State
skipper view
Returns the current state of the active browser window, including:
- Page title and URL
- AI-generated description of the page content
- Screenshot analysis
2. Navigate to URL
skipper navigate --url "https://example.com"
Navigates the browser to the specified URL and returns the new page state.
3. Execute Commands
skipper command --command_type <type> --prompt "<description>"
Execute actions on the current page:
- Click:
skipper command --command_type click --prompt "Click the login button" - Type:
skipper command --command_type type --prompt "Enter username: john.doe<Enter>" - Scroll:
skipper command --command_type scroll --prompt "Scroll down"
Keystroke Instructions
Use Playwright-style keystroke commands in brackets:
<Enter>- Press Enter key<Tab>- Press Tab key<ControlOrMeta+A>- Select all (Ctrl+A or Cmd+A)<Delete>- Press Delete key<Escape>- Press Escape key
Examples:
Hello<Enter>- Type "Hello" then press Enter<ControlOrMeta+A><Delete>- Select all text and delete itusername<Tab>password<Enter>- Type username, tab to next field, type password, press Enter
Integration with Claude Code
Skipper is designed to work seamlessly with Claude Code and other AI agents. Here's how to integrate it:
Agent Configuration
Add Skipper to your Claude Code agent configuration:
tools:
- name: skipper
description: Desktop interaction tool for browser and application control
commands:
- name: view
description: View current window state
usage: skipper view
- name: navigate
description: Navigate to URL
usage: skipper navigate --url <url>
- name: command
description: Execute desktop action
usage: skipper command --command_type <click|type|scroll> --prompt "<description>"
Example Agent Workflow
# Agent starts by viewing the current state
$ skipper view
Page title: Google
Page URL: https://www.google.com
Screenshot analysis: Google search homepage with search bar and navigation options
# Agent navigates to a specific site
$ skipper navigate --url "https://accounts.venmo.com"
Page title: Venmo - Log in
Page URL: https://accounts.venmo.com
Screenshot analysis: Venmo login page with username/email and password fields
# Agent clicks on the username field
$ skipper command --command_type click --prompt "Click the username or email field"
Page title: Venmo - Log in
Page URL: https://accounts.venmo.com
Screenshot analysis: Username field is now focused and highlighted
# Agent types credentials
$ skipper command --command_type type --prompt "Enter username: john.doe<Enter>"
Page title: Venmo - Log in
Page URL: https://accounts.venmo.com
Screenshot analysis: Username entered, cursor moved to password field
Advanced Features
Debug Mode
Enable debug logging to save screenshots and detailed logs:
# Set debug folder in ~/.skipperrc
[debug]
enabled = true
folder = "/path/to/debug/folder"
# Or use environment variable
export SKIPPER_DEBUG_FOLDER="/path/to/debug/folder"
Local AI Models
For enhanced privacy, you can use local AI models:
# Install local dependencies
pip install -e .[local]
# Configure local model paths in ~/.skipperrc
[models]
yolo_model_path = "/path/to/local/model.pt"
Custom Configuration
Edit ~/.skipperrc to customize:
[models]
screenshot_model = "gemini-2.5-flash"
ui_element_model = "gemini-2.5-pro"
[browser]
cdp_url = "http://localhost:9222"
context_index = 0
page_index = 0
[ui_interaction]
click_delay_seconds = 1.0
scroll_distance = 600
mouse_scale_factor = 0.5
Troubleshooting
Common Issues
- Chrome not responding: Ensure Chrome is running with
--remote-debugging-port=9222 - API key errors: Set
GEMINI_API_KEYenvironment variable or add to~/.skipperrc - Permission errors: Check that Skipper has access to the browser and debug port
Debug Information
# Enable verbose logging
export SKIPPER_DEBUG_FOLDER="/tmp/skipper-debug"
skipper view
# Check logs in the debug folder
ls /tmp/skipper-debug/
Security Considerations
- Skipper requires access to your browser and can execute actions on your behalf
- API keys are stored locally in
~/.skipperrc - Debug mode saves screenshots locally - ensure the debug folder is secure
- Only use with trusted AI agents
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
[Add your license information here]
Support
- Issues: GitHub Issues
- Documentation: GitHub Wiki
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skipper_tool-0.1.0.tar.gz.
File metadata
- Download URL: skipper_tool-0.1.0.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c2194f00f92274c686f57242366626b4f17387b04f27004753d2455683472f5
|
|
| MD5 |
cfdd378bd0f7a616d9568575ec09fa8c
|
|
| BLAKE2b-256 |
bae2e0400c90f4eedeb2ce755a0c64cdc05d43365032b0254313c84a588812b8
|
File details
Details for the file skipper_tool-0.1.0-py3-none-any.whl.
File metadata
- Download URL: skipper_tool-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ae2695c5d9047c1dea08f9faab9de27c576bc3b1fa3df044441cedcb9c6bba9
|
|
| MD5 |
cbc8c71eb1fdf09ce7cdb52b2770c32e
|
|
| BLAKE2b-256 |
0a658a39857f459aa58730098e296d46ce14b827bd9697aa4ecf7afcf3441d2a
|