Skip to main content

Computer use agent for macOS

Project description

🍎 MacOS-Use

PyPI Downloads License Python Platform: macOS 12+
Follow on Twitter Join us on Discord

MacOS-Use is an AI agent that controls macOS at the GUI layer. It reads the screen via the macOS Accessibility API and uses any LLM to decide what to click, type, scroll, or run — no computer vision model required.

Give it a task in plain English. It handles the rest.

What It Can Do

  • Open, switch between, and resize application windows
  • Click, type, scroll, drag, and use keyboard shortcuts
  • Run shell commands and AppleScript via osascript
  • Scrape web pages via the browser accessibility tree
  • Read and write files on the filesystem
  • Manage macOS virtual desktops (Spaces) via Mission Control
  • Remember information across steps with persistent memory
  • Speak and listen via STT/TTS (voice input and output)

🛠️ Installation

Prerequisites: Python 3.12+, macOS 12 (Monterey) or later

pip install macos-use

Or with uv:

uv add macos-use

⚙️ Quick Start

Pick any supported LLM provider and run a task:

Anthropic (Claude)

from macos_use.providers.anthropic import ChatAnthropic
from macos_use import Agent, Browser

llm = ChatAnthropic(model="claude-sonnet-4-5")
agent = Agent(llm=llm, browser=Browser.SAFARI)
agent.invoke(task="Open Notes and write a short poem about macOS")

OpenAI

from macos_use.providers.openai import ChatOpenAI
from macos_use import Agent, Browser

llm = ChatOpenAI(model="gpt-4o")
agent = Agent(llm=llm, browser=Browser.CHROME)
agent.invoke(task="Search for the weather in New York on Google")

Google Gemini

from macos_use.providers.google import ChatGoogle
from macos_use import Agent, Browser

llm = ChatGoogle(model="gemini-2.5-flash")
agent = Agent(llm=llm, browser=Browser.SAFARI)
agent.invoke(task=input("Enter a task: "))

Ollama (Local)

from macos_use.providers.ollama import ChatOllama
from macos_use import Agent

llm = ChatOllama(model="qwen3-vl:4b")
agent = Agent(llm=llm, use_vision=False)
agent.invoke(task=input("Enter a task: "))

Async Usage

import asyncio
from macos_use.providers.anthropic import ChatAnthropic
from macos_use import Agent

async def main():
    llm = ChatAnthropic(model="claude-sonnet-4-5")
    agent = Agent(llm=llm)
    result = await agent.ainvoke(task="Take a screenshot and describe the desktop")
    print(result.content)

asyncio.run(main())

🤖 CLI

Run the interactive agent directly from your terminal:

macos-use

🔌 Supported LLM Providers

Provider Import
Anthropic from macos_use.providers.anthropic import ChatAnthropic
OpenAI from macos_use.providers.openai import ChatOpenAI
Google from macos_use.providers.google import ChatGoogle
Groq from macos_use.providers.groq import ChatGroq
Ollama from macos_use.providers.ollama import ChatOllama
Mistral from macos_use.providers.mistral import ChatMistral
Cerebras from macos_use.providers.cerebras import ChatCerebras
DeepSeek from macos_use.providers.deepseek import ChatDeepSeek
Azure OpenAI from macos_use.providers.azure_openai import ChatAzureOpenAI
Open Router from macos_use.providers.open_router import ChatOpenRouter
LiteLLM from macos_use.providers.litellm import ChatLiteLLM
NVIDIA from macos_use.providers.nvidia import ChatNvidia
vLLM from macos_use.providers.vllm import ChatVLLM

🧰 Agent Configuration

Agent(
    llm=llm,                        # LLM instance (required)
    mode="normal",                  # "normal" (full context) or "flash" (lightweight, faster)
    browser=Browser.SAFARI,         # Browser.SAFARI | Browser.CHROME | Browser.FIREFOX | Browser.EDGE
    use_vision=False,               # Send screenshots to the LLM
    use_annotation=False,           # Annotate UI elements on screenshots
    use_accessibility=True,         # Use the macOS accessibility tree
    auto_minimize=False,            # Minimize active window before the agent starts
    max_steps=25,                   # Max number of steps before giving up
    max_consecutive_failures=3,     # Abort after N consecutive tool failures
    instructions=[],                # Extra system instructions
    log_to_console=True,            # Print steps to the console
    log_to_file=False,              # Write steps to a log file
    event_subscriber=None,          # Custom event listener (see Events section)
    experimental=False,             # Enable experimental tools (memory, multi-select, multi-edit)
)

Tip: Use claude-haiku-4-5, claude-sonnet-4-5, or claude-opus-4-5 for best results.

🛠️ Tools

The agent has access to these tools automatically — no configuration needed.

Core Tools:

Tool Description
click_tool Left, right, middle click or hover at coordinates
type_tool Type text into any input field
scroll_tool Scroll vertically or horizontally
move_tool Move mouse or drag-and-drop
shortcut_tool Press keyboard shortcuts (e.g. cmd+c, cmd+tab)
app_tool Launch, switch, or resize application windows
shell_tool Run bash commands or AppleScript (osascript)
scrape_tool Extract text content from web pages
desktop_tool Create, remove, switch macOS virtual desktops (Spaces)
wait_tool Pause execution for N seconds
done_tool Return the final answer to the user

Experimental Tools (enable with experimental=True):

Tool Description
memory_tool Persist information across steps in markdown files
multi_select_tool Cmd+click multiple elements at once
multi_edit_tool Fill multiple form fields in one action

📡 Events

Observe every step the agent takes with the event system:

from macos_use import Agent, AgentEvent, EventType, BaseEventSubscriber

class MySubscriber(BaseEventSubscriber):
    def invoke(self, event: AgentEvent):
        if event.type == EventType.TOOL_CALL:
            print(f"Tool: {event.data['tool_name']}")
        elif event.type == EventType.DONE:
            print(f"Done: {event.data['content']}")

agent = Agent(llm=llm, event_subscriber=MySubscriber())

Or use a plain callable:

def on_event(event: AgentEvent):
    print(f"{event.type.value}: {event.data}")

agent = Agent(llm=llm, event_subscriber=on_event)

Event types: THOUGHT · TOOL_CALL · TOOL_RESULT · DONE · ERROR

🎙️ Voice (STT / TTS)

MacOS-Use supports voice input and spoken output via multiple providers.

STT (Speech-to-Text): OpenAI Whisper · Google · Groq · ElevenLabs · Deepgram

TTS (Text-to-Speech): OpenAI · Google · Groq · ElevenLabs · Deepgram

from macos_use.providers.openai import ChatOpenAI, STTOpenAI, TTSOpenAI
from macos_use.speech import STT, TTS

llm = ChatOpenAI(model="gpt-4o")
stt = STT(provider=STTOpenAI())
tts = TTS(provider=TTSOpenAI())

task = stt.invoke()              # Record and transcribe voice input
agent = Agent(llm=llm)
result = agent.invoke(task=task)
tts.invoke(result.content)       # Speak the response aloud

🖥️ Virtual Desktops (Spaces)

The agent can manage macOS Spaces natively via Mission Control:

agent.invoke(task="Create a new Space and switch to it")
agent.invoke(task="Switch to Space 2")
agent.invoke(task="Remove the current Space")

Or use desktop_tool directly with actions: create, remove, switch.

Note: Switching by number requires the keyboard shortcut to be enabled in
System Settings → Keyboard → Shortcuts → Mission Control.

⚠️ Security

This agent can:

  • Operate your computer on behalf of the user
  • Modify files and system settings
  • Make irreversible changes to your system

⚠️ STRONGLY RECOMMENDED: Deploy in a Virtual Machine or dedicated test machine

The project provides NO sandbox or isolation layer. For your safety:

  • ✅ Use a Virtual Machine (UTM, Parallels, VMware Fusion)
  • ✅ Use a dedicated test Mac
  • ✅ Close sensitive applications before running

📖 Read the full Security Policy before deployment.

📡 Telemetry

MacOS-Use includes lightweight, privacy-friendly telemetry to help improve reliability and understand real-world usage.

Disable it at any time:

ANONYMIZED_TELEMETRY=false

Or in code:

import os
os.environ["ANONYMIZED_TELEMETRY"] = "false"

🪪 License

MIT — see LICENSE.

🙏 Acknowledgements

  • PyObjC — macOS Accessibility API bindings

🤝 Contributing

Contributions are welcome! See CONTRIBUTING for the development workflow.

Made with ❤️ by Jeomon George


Citation

@software{
  author       = {George, Jeomon},
  title        = {MacOS-Use: Enable AI to control macOS},
  year         = {2025},
  publisher    = {GitHub},
  url          = {https://github.com/CursorTouch/MacOS-Use}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macos_use-0.1.0.tar.gz (346.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

macos_use-0.1.0-py3-none-any.whl (154.0 kB view details)

Uploaded Python 3

File details

Details for the file macos_use-0.1.0.tar.gz.

File metadata

  • Download URL: macos_use-0.1.0.tar.gz
  • Upload date:
  • Size: 346.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for macos_use-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f86686e6a72deb53d3cfde10d5059d58268612ac705d43c7cef99a13df957709
MD5 278722f8ff9517b80bbd9c5ab3210ad9
BLAKE2b-256 8058c591caeaca059e498c33cd701013a08b04d6d4a780e19d5c7093501a3f90

See more details on using hashes here.

File details

Details for the file macos_use-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: macos_use-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 154.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for macos_use-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ed116de4c92d4a450dd9358244b8a51aec38edd6a5798541a0f3563cc1393d4
MD5 9d58617485492e8bd65787a4e20203c7
BLAKE2b-256 4392b5283ad0148614bff246bf650cde07bbe265694ddd513ce298862a08c515

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page