Skip to main content

AI-powered cross-platform desktop automation agent.

Project description

⬡ Sentinel Desktop v18.0

AI-powered desktop automation agent — cross-platform, cyberpunk HUD edition.

Give it a goal in plain English. It sees your screen, moves the mouse, types, and interacts with any application — autonomously.

Python 3.10+ Version Tests Lint License: MIT

105+ action types · 7,882 tests · 35+ API endpoints · 20+ LLM providers · MCP server · Fleet/daemon mode


🧠 Neuralis Brain — One mind across your whole fleet

Sentinel doesn't just automate — it learns. Every task it runs, every fix it lands, every tricky issue it solves gets distilled into the Neuralis Brain — a shared memory that every agent in your fleet reads from and writes to.

The brain learns from all clients and gets better and better with every task.

  • 🔁 Cross-agent memory — Sentinel Desktop, Claude Code, opencode, and every other tool in the fleet share one brain. A fix Sentinel finds on a server at 2am is knowledge Claude Code can recall at noon.
  • 📈 Gets smarter over time — the more tasks it runs, the more context the brain holds. Hard-won solutions to advanced technical issues (server configs, stubborn drivers, network edge cases) are never solved twice.
  • 🖥️ Built for the field — engineered for IT work on servers and workstations. Sentinel captures the goal → actions → outcome of each engagement and feeds the durable lessons back in.
  • 🔌 Direct bridge — Sentinel speaks to the brain over HTTP (NEURALIS_BRAIN_URL), no extra processes. Seven operations: think, recall, search, context, opinions, fire, stats.

Status: arriving in v18.0. The bridge is the foundation — automatic recall-at-task-start and a full consolidation loop land in the phases that follow. Track progress in CLAUDE.md.


Features

  • 🤖 Vision-driven agent loop — screenshots → LLM → action → verify → repeat
  • 🖱️ Full desktop control — mouse, keyboard, clipboard, file I/O, multi-monitor screenshots
  • 👁️ OCR-awareclick_text and read_text use Tesseract to locate and read on-screen text
  • 🪟 UIAutomationclick_control / set_text / list_controls drive native Windows controls by accessibility name (the desktop analogue of CSS selectors)
  • 🎯 Animated cursor overlay — glides to each action location, pulses, then fades — just like Sentinel Override's operator cursor
  • 🔌 20+ LLM providers with native tool/function calling — OpenAI (ChatGPT), Anthropic (Claude), Google Gemini, xAI Grok, DeepSeek, OpenRouter, Groq, Mistral, Together, Fireworks, Cerebras, Perplexity, Z.ai (GLM-5 / coding plan), MiniMax, Moonshot (Kimi), Qwen (Alibaba), Cohere, NVIDIA NIM, HuggingFace, GitHub Models, DeepInfra, Azure OpenAI, Ollama (local), LM Studio (local), and any OpenAI-compatible custom endpoint
  • 🌐 Three modes — GUI, headless API, CLI (--dry-run flag to preview without acting)
  • 🔒 Safety stack — approval gate per state-changing action, Esc-x3 panic stop, sensitive-field filter, tenant lockdown, dry-run
  • 🔁 Retry/backoff on transient LLM errors with friendly error messages
  • 📝 Forensic logging — structured per-step audit trail with JSON/CSV export
  • 💾 Checkpoint & Resume — auto-saves state every 5 steps; resume after crash or close
  • ⌨️ Command palette (Ctrl+K) — fuzzy-search commands, themes, settings
  • 🎨 14 themes — Midnight, Dark, Matrix, Tron, Cyberpunk, Neon, Terminal, Blood, Ocean, Light, Sunset, Paper, Forest, Mono
  • 🖥️ Virtual Desktop isolation — agent operates on its own Windows desktop, never interrupts the user
  • 🥷 Stealth input — PostMessage / UIAInvoke for non-interrupting actions (no mouse/keyboard hijack)
  • 📡 WebSocket live feed — every step broadcast to connected clients
  • 🧠 Neuralis Brain integration (v18.0) — shared, fleet-wide memory: Sentinel writes what it learns and recalls what every other agent has learned, so it gets smarter with every task

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run GUI mode
python main.py

# Run headless API server
python main.py --api --port 8091

# Run single command
python main.py -c "Open Notepad and type Hello World"

# Dry-run (logs state-changing actions instead of executing them)
python main.py --dry-run -c "Open Notepad and type Hello World"

Or on Windows: double-click install_and_run.bat

Safety hotkeys

Press Esc three times within 1.5 seconds to immediately stop the agent. This works globally and is independent of pyautogui's move-to-corner failsafe. Requires the optional keyboard package (installed by default).

Testing

pip install -e ".[dev]"
pytest tests/ -q          # 7,882 tests
ruff check core/ gui/ api/ tests/   # zero lint errors

Configuration

First run opens settings. Configure:

  1. Provider — Choose your LLM provider (OpenAI, Anthropic, etc.)
  2. API Key — Paste your key
  3. Model — Enter model name or auto-detect
  4. Step Budget — Max actions per goal (default: 100)

Config stored at:

  • Windows: %APPDATA%\SentinelDesktop\config.json
  • Linux/Mac: ~/.sentinel-desktop/config.json

API Reference

When running in --api mode:

Method Endpoint Description
POST /goal Start agent with a goal
POST /command Execute single action
POST /stop Stop running agent
GET /screenshot Capture screen as base64 PNG
GET /status Agent status
GET /windows List visible windows
GET /processes List running processes
GET /system System info
GET /config Read config
PUT /config Update config
GET /log Forensic run log
WS /ws Live status feed

Examples

# Start a goal
curl -X POST http://localhost:8091/goal \
  -H "Content-Type: application/json" \
  -d '{"goal": "Open Chrome and navigate to github.com"}'

# Take a screenshot
curl http://localhost:8091/screenshot

# Execute a direct action
curl -X POST http://localhost:8091/command \
  -d '{"command": "{\"action\":\"click\",\"x\":500,\"y\":300}"}'

Supported Actions

The agent can perform these actions:

Action Description
click Click at screen coordinates
click_text OCR the screen, find visible text, click it (requires Tesseract)
click_image Find and click a template image
click_control Click a native Windows control by accessibility name (requires uiautomation)
list_controls Enumerate accessible controls (buttons, edits, menus) in a window
set_text Deterministically set the value of an editable control
read_text OCR the entire screen and return its text
type_text Type text character by character
press_key Press a single key
hotkey Press key combination
scroll Scroll up or down
screenshot Take a fresh screenshot
find_image Find image on screen
wait_for_image Wait for image to appear
wait Wait N seconds
open_app Start a program
focus_window Bring window to front
close_window Close a window
list_windows List all visible windows
read_file Read a text file
write_file Write a text file
list_directory List directory contents
clipboard_read Read clipboard
clipboard_write Write to clipboard
system_info Get system details
list_processes List running processes
kill_process Kill a process
note Make a note (no side effects)
finish Signal task completion

Architecture

sentinel-desktop/
├── main.py              # Entry point (GUI / API / CLI modes)
├── config.py            # Settings persistence
├── core/
│   ├── engine.py        # Agent loop (screenshot → LLM → action → verify)
│   ├── action_executor.py  # Dispatches actions to desktop control
│   ├── llm_client.py    # Multi-provider LLM client (20+ providers)
│   ├── provider_registry.py  # Provider catalog
│   ├── desktop.py       # Mouse, keyboard, screen control
│   ├── screenshot.py    # Screen capture + template matching + cache
│   ├── window_manager.py   # Window management
│   ├── process_manager.py  # Process management
│   ├── clipboard.py     # Clipboard read/write
│   ├── file_ops.py      # Safe file operations
│   ├── system_info.py   # System information
│   ├── control/         # Plan → Ground → Execute → Verify control loop
│   ├── perception/      # Multi-modal perception pipeline (accessibility + OCR + vision)
│   ├── platform/        # Cross-platform abstraction (Windows / Linux / macOS)
│   ├── swarm/           # Multi-agent orchestration (bus + registry + specialists)
│   ├── popup_handler.py # Automatic dialog detection and dismissal
│   ├── recovery.py      # Action retry and error recovery
│   ├── scheduler.py     # Cron-based task scheduling
│   ├── auth.py          # RBAC with bcrypt password hashing
│   ├── encryption.py    # Cross-platform encryption
│   └── ...              # 30+ more modules
├── api/
│   └── server.py        # FastAPI headless control server (35+ endpoints)
├── gui/
│   ├── app.py           # Main GUI window (cyberpunk HUD)
│   ├── themes.py        # 14 theme definitions
│   ├── overlay.py       # Action overlay + animated cursor
│   └── tabs/            # Settings, scripts, workflows, history tabs
├── scripts/it_support/  # 19 pre-built IT support script templates
├── tests/               # 7,882 tests, 99% coverage
└── requirements.txt

Safety

  • Approval mode: Every state-changing action requires user confirmation before execution (Approve / Reject dialog in the GUI)
  • Dry-run mode: --dry-run logs every action it would take without actually clicking or typing
  • Esc x3 failsafe: Three rapid Esc presses stop the agent immediately, globally
  • pyautogui corner failsafe: Move mouse to a screen corner to abort
  • Sensitive field protection: Won't type strings that look like passwords or credentials
  • Tenant lockdown: Restrict file access to tenant-scoped paths
  • Step budget: Agent stops after N actions (configurable, default 100)
  • Bounded conversation: Old screenshots are pruned from the LLM context so token cost stays predictable
  • LLM retry/backoff: Transient 429/5xx errors retry with exponential backoff
  • API auth: Set SENTINEL_API_TOKEN to require Authorization: Bearer <token> on every endpoint
  • Forensic log: Every action logged with timestamp, params, and result

Companion Projects

  • Neuralis Brain — Shared, fleet-wide memory that every Sentinel-family agent reads from and writes to; the brain learns from all clients and gets better with every task
  • Sentinel Override — Browser automation agent (Chrome extension)
  • Sentinel MCP — Model Context Protocol server

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentinel_desktop-22.0.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentinel_desktop-22.0.0-py3-none-any.whl (527.5 kB view details)

Uploaded Python 3

File details

Details for the file sentinel_desktop-22.0.0.tar.gz.

File metadata

  • Download URL: sentinel_desktop-22.0.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for sentinel_desktop-22.0.0.tar.gz
Algorithm Hash digest
SHA256 faf2b3cc5868ba5d2d81ef98312820c388b7abbf00aaabe46eacac2079f8ba36
MD5 537b09768591fc6099487cde1398c358
BLAKE2b-256 b2a3adaca047dbf4e52dd43d9956663a3f3725eccc4b4e2132ff2cd8cad9413a

See more details on using hashes here.

File details

Details for the file sentinel_desktop-22.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sentinel_desktop-22.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70a68ffe1d3e97d9838f197cf2cff239bf2248aa1d29f8357560ccf01b7780a0
MD5 8609c827a721d8035e32e71244ba5b87
BLAKE2b-256 3bb4a5a5264f4b2e24b3892fd9a5223e263167c4fb78dad887c0c0cbfce03933

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page