Skip to main content

Stateless black-box computer-use MCP server powered by a vision worker.

Project description

computer-agent-mcp

A black-box desktop automation MCP server — give it a task, it handles the screenshots, coordinates, and clicks internally, and returns the result.

PyPI version Python License: MIT

中文说明

How It Works

Unlike typical computer-use tools that expose raw screenshots to the host agent, computer-agent-mcp runs the entire vision loop server-side:

  1. Captures the current screen
  2. Sends the screenshot + task context to an internal vision model
  3. Receives observations, action plans, and coordinate mappings
  4. Executes actions locally with visible mouse trajectories
  5. Repeats until the task is done — then returns a structured result

The host agent never sees a screenshot. It just sends a task and gets back a result.

Features

  • Task-level API — one call to complete a desktop task, no multi-turn screenshot protocol
  • Server-side vision loop — screenshots, coordinate mapping, and action execution all handled internally
  • Human override detection — stops immediately when a real user touches the keyboard or mouse
  • Step-by-step debug recording — full event timeline, screenshots, and model request/response logs
  • Works with any OpenAI-compatible vision model — bring your own endpoint and model

Quick Start

Prerequisites

  • Windows
  • Python >= 3.11
  • An OpenAI-compatible API key

Install & Run

The quickest way to start:

uvx computer-agent-mcp \
  --api-key sk-... \
  --base-url https://api.openai.com/v1 \
  --model gpt-5.4

Or install via pip:

pip install computer-agent-mcp
computer-agent-mcp \
  --api-key sk-... \
  --base-url https://api.openai.com/v1 \
  --model gpt-5.4

MCP Host Configuration

Add to your MCP client config (e.g. Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "computer-agent": {
      "command": "uvx",
      "args": [
        "computer-agent-mcp",
        "--base-url",
        "https://api.openai.com/v1",
        "--model",
        "gpt-5.4"
      ],
      "env": {
        "COMPUTER_AGENT_OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Tools

computer_use_task

Run a stateless black-box desktop task.

Parameter Default Description
task (required) Natural language description of what to do
display_id "primary" Target display
max_steps 30 Maximum vision-action loop iterations

Returns structured result with status (completed / blocked / failed), summary, result, memory, and trace.

computer_list_displays

List available displays. Useful for multi-monitor setups.

Configuration

All CLI parameters can also be set via environment variables:

CLI Flag Env Variable Default Description
--api-key COMPUTER_AGENT_OPENAI_API_KEY API key (also reads OPENAI_API_KEY)
--base-url COMPUTER_AGENT_OPENAI_BASE_URL https://api.openai.com/v1 API base URL
--model COMPUTER_AGENT_OPENAI_MODEL gpt-5.4 Vision model to use
--max-steps-default COMPUTER_AGENT_MAX_STEPS_DEFAULT 30 Default max steps per task
--max-duration-s-default COMPUTER_AGENT_MAX_DURATION_S_DEFAULT 120 Default max duration (seconds)
--debug-dir COMPUTER_AGENT_DEBUG_DIR .computer_agent_mcp_debug/ Debug output directory
--log-level COMPUTER_AGENT_LOG_LEVEL INFO Log level

Enable debug recording with COMPUTER_AGENT_DEBUG=1. See REFERENCE.md for the full configuration reference and detailed runtime semantics.

Development

pip install -e .[dev]
pytest

Platform Support

Currently Windows only. The server will start on other platforms but desktop tool calls will fail.

Contributing

Contributions are welcome! Please open an issue first to discuss what you'd like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

computer_agent_mcp-0.1.3.tar.gz (76.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

computer_agent_mcp-0.1.3-py3-none-any.whl (86.0 kB view details)

Uploaded Python 3

File details

Details for the file computer_agent_mcp-0.1.3.tar.gz.

File metadata

  • Download URL: computer_agent_mcp-0.1.3.tar.gz
  • Upload date:
  • Size: 76.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for computer_agent_mcp-0.1.3.tar.gz
Algorithm Hash digest
SHA256 29792af9f15b815dac1e319c72f41a0b077cd22b62e91f12ca5d1afa2f00d60f
MD5 687151df969f706a69964ed94a52675a
BLAKE2b-256 1a16f9692015a6c3bf4fc0fec8a7159883226ffd51b74d318d0de7375b80816c

See more details on using hashes here.

File details

Details for the file computer_agent_mcp-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: computer_agent_mcp-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 86.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for computer_agent_mcp-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e8a4a35dd6012ee4d666ddc86014fcea17c2a318a968fa517edbb89b33f746d4
MD5 868f69e7c0149673142e06ac3a41fb35
BLAKE2b-256 bcb28f5ac514c5319eeec485e8f25b76ccb25ccad042a4be899e386c0cd8a31a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page