Skip to main content

Computer Use SDK for Python — build screen-controlling AI agents in a few lines of code

Project description

screenagent

Control any macOS app with Claude — Python SDK + CLI.

demo

Browser Use and Skyvern only work inside the browser. screenagent uses macOS Accessibility API + CGEvent for native input, so it works with System Settings, Finder, Notes, Calculator, and any app.

Install

pip install screenagent-ai

This installs both the Python SDK (from screenagent import ...) and the screenagent CLI command.

Requires macOS and Python 3.11+.

Setup

1. Anthropic API Key

Get your key at console.anthropic.com and set it:

export ANTHROPIC_API_KEY="sk-ant-..."

2. Accessibility Permission

macOS requires you to grant accessibility access to your terminal app:

System Settings → Privacy & Security → Accessibility → add your terminal (Terminal.app, iTerm2, VS Code, etc.)

Without this, screenagent cannot read UI elements or send keyboard/mouse events.

Usage 1: CLI

Run directly from the terminal. Works with Claude Code out of the box.

# Control native apps (finds and launches via Spotlight automatically)
screenagent run "Open Calculator and compute 42 * 17"
screenagent run "Open System Settings and switch to Dark Mode"

# Browser automation
screenagent run "Open Chrome, go to youtube.com, search for ycombinator"
screenagent run --app "Google Chrome" "Go to google.com and search for AI news"

# Individual actions (no API key needed)
screenagent screenshot --file screen.png
screenagent ax-tree "Google Chrome"
screenagent click 640 400
screenagent type "hello world"
screenagent key return --modifiers command

Usage 2: Python SDK

3-line agent

from screenagent import Agent

agent = Agent()
result = agent.run("Open System Settings and switch to Dark Mode")
print(result.summary)
print(result.success)

Component functions (no API key needed)

from screenagent import screenshot, click, type_text, key_press, get_ui_tree

png_bytes = screenshot()
click(640, 400)
type_text("hello world")
key_press("return")

tree = get_ui_tree("Google Chrome")
print(tree.to_text())

Configuration

Environment Variable Default Description
ANTHROPIC_API_KEY Claude API key (required for agent)
AGENT_MODEL claude-sonnet-4-6 Model to use
AGENT_MAX_STEPS 20 Maximum agent loop iterations
AGENT_COMPUTER_USE true Use Claude computer-use tool
CDP_PORT 9222 Chrome DevTools Protocol port

Also supports .env files.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenagent_ai-0.3.3.tar.gz (101.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screenagent_ai-0.3.3-py3-none-any.whl (41.1 kB view details)

Uploaded Python 3

File details

Details for the file screenagent_ai-0.3.3.tar.gz.

File metadata

  • Download URL: screenagent_ai-0.3.3.tar.gz
  • Upload date:
  • Size: 101.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for screenagent_ai-0.3.3.tar.gz
Algorithm Hash digest
SHA256 3f7e0cc52378e6f33906113a0a4395acd92301801c9bc14129348653fc7aa921
MD5 c8ee43a7d1fedde68d85963149de97a1
BLAKE2b-256 1bb97053001eb5d7ede382fb948ec6fd5a2e635fc32eae77dbcb0cec78a96ef6

See more details on using hashes here.

File details

Details for the file screenagent_ai-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for screenagent_ai-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7e5468e2a37558a618503a02a1cc7c287609b4da5b83afc2feee65be4ad12ebb
MD5 a3e150fa526e78c1de667561a63e386b
BLAKE2b-256 fe1fb23bae2cf5329ef3772b9ebc37192f5fbcbbea727cb12c095e6e3a20570b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page