Skip to main content

Computer Use SDK for Python — build screen-controlling AI agents in a few lines of code

Project description

screenagent

Control any macOS app with Claude — Python SDK + CLI.

demo

Browser Use and Skyvern only work inside the browser. screenagent uses macOS Accessibility API + CGEvent for native input, so it works with System Settings, Finder, Notes, Calculator, and any app.

Install

pip install screenagent-ai

This installs both the Python SDK (from screenagent import ...) and the screenagent CLI command.

Requires macOS and Python 3.11+.

Setup

1. Anthropic API Key

Get your key at console.anthropic.com and set it:

export ANTHROPIC_API_KEY="sk-ant-..."

2. Accessibility Permission

macOS requires you to grant accessibility access to your terminal app:

System Settings → Privacy & Security → Accessibility → add your terminal (Terminal.app, iTerm2, VS Code, etc.)

Without this, screenagent cannot read UI elements or send keyboard/mouse events.

Usage 1: CLI

Run directly from the terminal. Works with Claude Code out of the box.

# Control native apps (finds and launches via Spotlight automatically)
screenagent run "Open Calculator and compute 42 * 17"
screenagent run "Open System Settings and switch to Dark Mode"

# Browser automation
screenagent run "Open Chrome, go to youtube.com, search for ycombinator"
screenagent run --app "Google Chrome" "Go to google.com and search for AI news"

# Individual actions (no API key needed)
screenagent screenshot --file screen.png
screenagent ax-tree "Google Chrome"
screenagent click 640 400
screenagent type "hello world"
screenagent key return --modifiers command

Usage 2: Python SDK

3-line agent

from screenagent import Agent

agent = Agent()
result = agent.run("Open System Settings and switch to Dark Mode")
print(result.summary)
print(result.success)

Component functions (no API key needed)

from screenagent import screenshot, click, type_text, key_press, get_ui_tree

png_bytes = screenshot()
click(640, 400)
type_text("hello world")
key_press("return")

tree = get_ui_tree("Google Chrome")
print(tree.to_text())

Configuration

Environment Variable Default Description
ANTHROPIC_API_KEY Claude API key (required for agent)
AGENT_MODEL claude-sonnet-4-6 Model to use
AGENT_MAX_STEPS 20 Maximum agent loop iterations
AGENT_COMPUTER_USE true Use Claude computer-use tool
CDP_PORT 9222 Chrome DevTools Protocol port

Also supports .env files.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenagent_ai-0.3.2.tar.gz (99.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screenagent_ai-0.3.2-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file screenagent_ai-0.3.2.tar.gz.

File metadata

  • Download URL: screenagent_ai-0.3.2.tar.gz
  • Upload date:
  • Size: 99.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for screenagent_ai-0.3.2.tar.gz
Algorithm Hash digest
SHA256 b75f8e4c2c58d0c9299b475d133759c46a7e192c03d9ee6069573a33a7a2aead
MD5 8c64b2ad29432969abddbf66ec8863fc
BLAKE2b-256 9e057e34b0843319e1fb131c23109664f371ee65039bd1b356d07965e43f14dd

See more details on using hashes here.

File details

Details for the file screenagent_ai-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for screenagent_ai-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cbfdf63486c72cb93fa589e799593072594886af1f0c99328bb57f53f7cbfa45
MD5 3e0cf82afcc20acfd9d88894e9bbc3b5
BLAKE2b-256 c8e89d0643ddb6a9fa1d92b181479772d4f7a9b1de9740f87dcf850c1ad477b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page