A desktop interaction tool that allows AI agents like Claude Code or OpenAI Codex to control web browsers and interact with desktop applications through natural language commands.

Project description

Skipper

A desktop interaction tool that allows AI agents like Claude Code or OpenAI Codex to control web browsers and interact with desktop applications through natural language commands.

Overview

Skipper enables AI agents to:

View and navigate the web using your own browser
Navigate to URLs
Execute mouse clicks, keyboard input, and scrolling actions
[SOON] Interact with any desktop application through natural language prompts

The Vision

Skipper is a command line tool that gives AI agents the ability to interact with your browser/desktop. Unlike all-in-one computer use tools, Skipper takes the unix philosophy of "do one thing and do it well". Specifically, it is designed to be the "hands" of the AI agent, instead of the "brain".

Our architecture is designed to be privacy-preserving in the future. If the tool ends up being useful, we have designed the architecture to be possible to run locally only. The only sensitive information that would go to the cloud would be in text to the LLM agent, which could be censored or modified as necessary for privacy.

Installation

Prerequisites

Python 3.9 or higher
Chrome/Chromium browser with remote debugging enabled
Gemini API key (for AI-powered interactions)
Either
- A computer capable of running OmniParser
- An API key for Skipper to run this stage remotely

Install Skipper

pip install skipper-tool

Setup Chrome Remote Debugging

Start Chrome with remote debugging enabled:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Linux
google-chrome --remote-debugging-port=9222

# Windows
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222

Keep Chrome running in the background while using skipper_tool.

Initial Configuration

# Create a configuration file
skipper init --config

# This will prompt for your Gemini API key and create ~/.skipperrc

Usage

Skipper provides three main commands for AI agents:

1. View Window State

skipper view

Returns the current state of the active browser window, including:

Page title and URL
AI-generated description of the page content
Screenshot analysis

2. Navigate to URL

skipper navigate --url "https://example.com"

Navigates the browser to the specified URL and returns the new page state.

3. Execute Commands

skipper command --command_type <type> --prompt "<description>"

Execute actions on the current page:

Click: skipper command --command_type click --prompt "Click the login button"
Type: skipper command --command_type type --prompt "Enter username: john.doe<Enter>"
Scroll: skipper command --command_type scroll --prompt "Scroll down"

Keystroke Instructions

Use Playwright-style keystroke commands in brackets:

<Enter> - Press Enter key
<Tab> - Press Tab key
<ControlOrMeta+A> - Select all (Ctrl+A or Cmd+A)
<Delete> - Press Delete key
<Escape> - Press Escape key

Examples:

Hello<Enter> - Type "Hello" then press Enter
<ControlOrMeta+A><Delete> - Select all text and delete it
username<Tab>password<Enter> - Type username, tab to next field, type password, press Enter

Integration with Claude Code

Skipper is designed to work seamlessly with Claude Code and other AI agents. Here's how to integrate it:

Agent Configuration

Add Skipper to your Claude Code agent configuration:

tools:
  - name: skipper
    description: Desktop interaction tool for browser and application control
    commands:
      - name: view
        description: View current window state
        usage: skipper view
      - name: navigate  
        description: Navigate to URL
        usage: skipper navigate --url <url>
      - name: command
        description: Execute desktop action
        usage: skipper command --command_type <click|type|scroll> --prompt "<description>"

Example Agent Workflow

# Agent starts by viewing the current state
$ skipper view
Page title: Google
Page URL: https://www.google.com
Screenshot analysis: Google search homepage with search bar and navigation options

# Agent navigates to a specific site
$ skipper navigate --url "https://accounts.venmo.com"
Page title: Venmo - Log in
Page URL: https://accounts.venmo.com
Screenshot analysis: Venmo login page with username/email and password fields

# Agent clicks on the username field
$ skipper command --command_type click --prompt "Click the username or email field"
Page title: Venmo - Log in
Page URL: https://accounts.venmo.com
Screenshot analysis: Username field is now focused and highlighted

# Agent types credentials
$ skipper command --command_type type --prompt "Enter username: john.doe<Enter>"
Page title: Venmo - Log in
Page URL: https://accounts.venmo.com
Screenshot analysis: Username entered, cursor moved to password field

Advanced Features

Debug Mode

Enable debug logging to save screenshots and detailed logs:

# Set debug folder in ~/.skipperrc
[debug]
enabled = true
folder = "/path/to/debug/folder"

# Or use environment variable
export SKIPPER_DEBUG_FOLDER="/path/to/debug/folder"

Local AI Models

For enhanced privacy, you can use local AI models:

# Install local dependencies
pip install -e .[local]

# Configure local model paths in ~/.skipperrc
[models]
yolo_model_path = "/path/to/local/model.pt"

Custom Configuration

Edit ~/.skipperrc to customize:

[models]
screenshot_model = "gemini-2.5-flash"
ui_element_model = "gemini-2.5-pro"

[browser]
cdp_url = "http://localhost:9222"
context_index = 0
page_index = 0

[ui_interaction]
click_delay_seconds = 1.0
scroll_distance = 600
mouse_scale_factor = 0.5

Troubleshooting

Common Issues

Chrome not responding: Ensure Chrome is running with --remote-debugging-port=9222
API key errors: Set GEMINI_API_KEY environment variable or add to ~/.skipperrc
Permission errors: Check that Skipper has access to the browser and debug port

Debug Information

# Enable verbose logging
export SKIPPER_DEBUG_FOLDER="/tmp/skipper-debug"
skipper view

# Check logs in the debug folder
ls /tmp/skipper-debug/

Security Considerations

Skipper requires access to your browser and can execute actions on your behalf
API keys are stored locally in ~/.skipperrc
Debug mode saves screenshots locally - ensure the debug folder is secure
Only use with trusted AI agents

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

[Add your license information here]

Support

Issues: GitHub Issues
Documentation: GitHub Wiki

Project details

Release history Release notifications | RSS feed

0.2.13

Aug 27, 2025

0.2.12

Aug 24, 2025

0.2.11

Aug 24, 2025

0.2.10

Aug 22, 2025

0.2.9

Aug 22, 2025

0.2.8

Aug 21, 2025

0.2.6

Aug 21, 2025

0.2.5

Aug 21, 2025

0.2.4

Aug 21, 2025

0.2.3

Aug 21, 2025

0.2.2

Aug 21, 2025

0.2.1

Aug 21, 2025

0.2.0

Aug 21, 2025

0.1.10

Aug 21, 2025

0.1.9

Aug 21, 2025

0.1.8

Aug 21, 2025

0.1.7

Aug 20, 2025

0.1.6

Aug 20, 2025

0.1.5

Aug 20, 2025

0.1.4

Aug 20, 2025

0.1.3

Aug 20, 2025

0.1.2

Aug 20, 2025

0.1.1

Aug 20, 2025

This version

0.1.0

Aug 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skipper_tool-0.1.0.tar.gz (28.2 kB view details)

Uploaded Aug 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skipper_tool-0.1.0-py3-none-any.whl (33.2 kB view details)

Uploaded Aug 20, 2025 Python 3

File details

Details for the file skipper_tool-0.1.0.tar.gz.

File metadata

Download URL: skipper_tool-0.1.0.tar.gz
Upload date: Aug 20, 2025
Size: 28.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.16

File hashes

Hashes for skipper_tool-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9c2194f00f92274c686f57242366626b4f17387b04f27004753d2455683472f5`
MD5	`cfdd378bd0f7a616d9568575ec09fa8c`
BLAKE2b-256	`bae2e0400c90f4eedeb2ce755a0c64cdc05d43365032b0254313c84a588812b8`

See more details on using hashes here.

File details

Details for the file skipper_tool-0.1.0-py3-none-any.whl.

File metadata

Download URL: skipper_tool-0.1.0-py3-none-any.whl
Upload date: Aug 20, 2025
Size: 33.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.16

File hashes

Hashes for skipper_tool-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ae2695c5d9047c1dea08f9faab9de27c576bc3b1fa3df044441cedcb9c6bba9`
MD5	`cbc8c71eb1fdf09ce7cdb52b2770c32e`
BLAKE2b-256	`0a658a39857f459aa58730098e296d46ce14b827bd9697aa4ecf7afcf3441d2a`

See more details on using hashes here.

skipper-tool 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Skipper

Overview

The Vision

Installation

Prerequisites

Install Skipper

Setup Chrome Remote Debugging

Initial Configuration

Usage

1. View Window State

2. Navigate to URL

3. Execute Commands

Keystroke Instructions

Integration with Claude Code

Agent Configuration

Example Agent Workflow

Advanced Features

Debug Mode

Local AI Models

Custom Configuration

Troubleshooting

Common Issues

Debug Information

Security Considerations

Contributing

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes