Skip to main content

An AI Agent that interacts with Windows OS at GUI level.

Project description

🪟 Windows-Use

PyPI Downloads License Python Platform: Windows 7 to 11
Follow on Twitter Join us on Discord

Windows-Use is an AI agent that controls Windows at the GUI layer. It reads the screen via the Windows UI Automation API and uses any LLM to decide what to click, type, scroll, or run — no computer vision model required.

Give it a task in plain English. It handles the rest.

What It Can Do

  • Open, switch between, and resize application windows
  • Click, type, scroll, drag, and use keyboard shortcuts
  • Run PowerShell commands and read their output
  • Scrape web pages via the browser accessibility tree
  • Read and write files on the filesystem
  • Manage Windows virtual desktops (create, rename, switch)
  • Remember information across steps with persistent memory
  • Speak and listen via STT/TTS (voice input and output)

🛠️ Installation

Prerequisites: Python 3.10+, Windows 7/8/10/11

pip install windows-use

Or with uv:

uv add windows-use

⚙️ Quick Start

Pick any supported LLM provider and run a task:

Anthropic (Claude)

from windows_use.providers.anthropic import ChatAnthropic
from windows_use import Agent, Browser

llm = ChatAnthropic(model="claude-sonnet-4-5")
agent = Agent(llm=llm, browser=Browser.EDGE)
agent.invoke(task="Open Notepad and write a short poem about Windows")

OpenAI

from windows_use.providers.openai import ChatOpenAI
from windows_use import Agent, Browser

llm = ChatOpenAI(model="gpt-4o")
agent = Agent(llm=llm, browser=Browser.CHROME)
agent.invoke(task="Search for the weather in New York on Google")

Google Gemini

from windows_use.providers.google import ChatGoogle
from windows_use import Agent, Browser

llm = ChatGoogle(model="gemini-2.5-flash")
agent = Agent(llm=llm, browser=Browser.EDGE)
agent.invoke(task=input("Enter a task: "))

Ollama (Local)

from windows_use.providers.ollama import ChatOllama
from windows_use import Agent, Browser

llm = ChatOllama(model="qwen3-vl:235b-cloud")
agent = Agent(llm=llm, use_vision=False)
agent.invoke(task=input("Enter a task: "))

Async Usage

import asyncio
from windows_use.providers.anthropic import ChatAnthropic
from windows_use import Agent

async def main():
    llm = ChatAnthropic(model="claude-sonnet-4-5")
    agent = Agent(llm=llm)
    result = await agent.ainvoke(task="Take a screenshot and describe the desktop")
    print(result.content)

asyncio.run(main())

🤖 CLI

Run the interactive agent directly from your terminal:

windows-use

Options:

--model, -m      LLM model to use
--provider, -p   LLM provider
--max-steps      Max steps per task (default: 200)
--debug, -d      Enable debug logging

In-session commands:

Command Description
\llm Switch provider or model
\key Change API key
\speech Configure STT/TTS
\voice Record voice input
\clear Clear the screen
\quit Exit

🔌 Supported LLM Providers

Provider Import
Anthropic from windows_use.providers.anthropic import ChatAnthropic
OpenAI from windows_use.providers.openai import ChatOpenAI
Google from windows_use.providers.google import ChatGoogle
Groq from windows_use.providers.groq import ChatGroq
Ollama from windows_use.providers.ollama import ChatOllama
Mistral from windows_use.providers.mistral import ChatMistral
Cerebras from windows_use.providers.cerebras import ChatCerebras
DeepSeek from windows_use.providers.deepseek import ChatDeepSeek
Azure OpenAI from windows_use.providers.azure_openai import ChatAzureOpenAI
Open Router from windows_use.providers.open_router import ChatOpenRouter
LiteLLM from windows_use.providers.litellm import ChatLiteLLM
NVIDIA from windows_use.providers.nvidia import ChatNvidia
vLLM from windows_use.providers.vllm import ChatVLLM

🧰 Agent Configuration

Agent(
    llm=llm,                        # LLM instance (required)
    mode="normal",                  # "normal" (full context) or "flash" (lightweight, faster)
    browser=Browser.EDGE,           # Browser.EDGE | Browser.CHROME | Browser.FIREFOX
    use_vision=False,               # Send screenshots to the LLM
    use_annotation=False,           # Annotate UI elements on screenshots
    use_accessibility=True,         # Use the Windows accessibility tree
    auto_minimize=False,            # Minimize active window before the agent starts
    max_steps=25,                   # Max number of steps before giving up
    max_consecutive_failures=3,     # Abort after N consecutive tool failures
    instructions=[],                # Extra system instructions
    secrets={},                     # Key-value secrets passed to the agent context
    log_to_console=True,            # Print steps to the console
    log_to_file=False,              # Write steps to a log file
    event_subscriber=None,          # Custom event listener (see Events section)
    experimental=False,             # Enable experimental tools (file, memory, multi-select)
)

Tip: Use claude-haiku-4-*, claude-sonnet-4-*, or claude-opus-4-* for best results.

🛠️ Tools

The agent has access to these tools automatically — no configuration needed.

Core Tools:

Tool Description
click_tool Left, right, middle click or hover at coordinates
type_tool Type text into any input field
scroll_tool Scroll vertically or horizontally
move_tool Move mouse or drag-and-drop
shortcut_tool Press keyboard shortcuts (e.g. ctrl+c, alt+tab)
app_tool Launch, switch, or resize application windows
shell_tool Run PowerShell commands
scrape_tool Extract text content from web pages
desktop_tool Create, rename, switch Windows virtual desktops
wait_tool Pause execution for N seconds
done_tool Return the final answer to the user

Experimental Tools (enable with experimental=True):

Tool Description
file_tool Read, write, list, move, copy, delete files
memory_tool Persist information across steps in markdown files
multi_select_tool Ctrl+click multiple elements at once
multi_edit_tool Fill multiple form fields in one action

📡 Events

Observe every step the agent takes with the event system:

from windows_use import Agent, AgentEvent, EventType, BaseEventSubscriber

class MySubscriber(BaseEventSubscriber):
    def invoke(self, event: AgentEvent):
        if event.type == EventType.TOOL_CALL:
            print(f"Tool: {event.data['tool_name']}")
        elif event.type == EventType.DONE:
            print(f"Done: {event.data['answer']}")

agent = Agent(llm=llm, event_subscriber=MySubscriber())

Or use a plain callable:

def on_event(event: AgentEvent):
    print(f"{event.type.value}: {event.data}")

agent = Agent(llm=llm, event_subscriber=on_event)

Event types: THOUGHT · TOOL_CALL · TOOL_RESULT · DONE · ERROR

🎙️ Voice (STT / TTS)

Windows-Use supports voice input and spoken output via multiple providers.

STT (Speech-to-Text): OpenAI Whisper · Google · Groq · ElevenLabs · Deepgram

TTS (Text-to-Speech): OpenAI · Google · Groq · ElevenLabs · Deepgram

from windows_use.providers.openai import ChatOpenAI, STTOpenAI, TTSOpenAI
from windows_use.speech import STT, TTS

llm = ChatOpenAI(model="gpt-4o")
stt = STT(provider=STTOpenAI())
tts = TTS(provider=TTSOpenAI())

task = stt.invoke()              # Record and transcribe voice input
agent = Agent(llm=llm)
result = agent.invoke(task=task)
tts.invoke(result.content)       # Speak the response aloud

🖥️ Virtual Desktops

The agent can manage Windows virtual desktops natively:

from windows_use.vdm.core import create_desktop, switch_desktop, remove_desktop

create_desktop("Work")
switch_desktop("Work")
remove_desktop("Work")

Supported on Windows 10 (build 17763+) and all Windows 11 versions.

⚠️ Security

This agent can:

  • Operate your computer on behalf of the user
  • Modify files and system settings
  • Make irreversible changes to your system

⚠️ STRONGLY RECOMMENDED: Deploy in a Virtual Machine or Windows Sandbox

The project provides NO sandbox or isolation layer. For your safety:

  • ✅ Use a Virtual Machine (VirtualBox, VMware, Hyper-V)
  • ✅ Use Windows Sandbox (Windows 10/11 Pro/Enterprise)
  • ✅ Use a dedicated test machine

📖 Read the full Security Policy before deployment.

📡 Telemetry

Windows-Use includes lightweight, privacy-friendly telemetry to help improve reliability and understand real-world usage.

Disable it at any time:

ANONYMIZED_TELEMETRY=false

Or in code:

import os
os.environ["ANONYMIZED_TELEMETRY"] = "false"

Star History

Star History Chart

🪪 License

MIT — see LICENSE.

🙏 Acknowledgements

🤝 Contributing

Contributions are welcome! See CONTRIBUTING for the development workflow.

Made with ❤️ by Jeomon George


Citation

@software{
  author       = {George, Jeomon},
  title        = {Windows-Use: Enable AI to control Windows OS},
  year         = {2025},
  publisher    = {GitHub},
  url          = {https://github.com/CursorTouch/Windows-Use}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

windows_use-0.7.7.tar.gz (185.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

windows_use-0.7.7-py3-none-any.whl (239.2 kB view details)

Uploaded Python 3

File details

Details for the file windows_use-0.7.7.tar.gz.

File metadata

  • Download URL: windows_use-0.7.7.tar.gz
  • Upload date:
  • Size: 185.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for windows_use-0.7.7.tar.gz
Algorithm Hash digest
SHA256 820f909b0b80133701e57b591e9ae7a991014e672fafb1023bc2fb58763b7171
MD5 2f8e2ea7296d9719e849ca44d056a0d1
BLAKE2b-256 f83955da8c1adc02a01889c9d0cfe8e0227e28ebb7152a7df9af55e727d5dfcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for windows_use-0.7.7.tar.gz:

Publisher: publish.yml on CursorTouch/Windows-Use

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file windows_use-0.7.7-py3-none-any.whl.

File metadata

  • Download URL: windows_use-0.7.7-py3-none-any.whl
  • Upload date:
  • Size: 239.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for windows_use-0.7.7-py3-none-any.whl
Algorithm Hash digest
SHA256 96c8eb04617ce06e94cc9bd2edb6317484cfd2a3574b56a3c2d58fc495080463
MD5 26a991db1938f4378021cefdeb0b8113
BLAKE2b-256 3e6464034247603a4631541a880eb5bd8d93a15641313b86f30940bb35a3d223

See more details on using hashes here.

Provenance

The following attestation bundles were made for windows_use-0.7.7-py3-none-any.whl:

Publisher: publish.yml on CursorTouch/Windows-Use

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page