Skip to main content

Intelligent mobile simulator control — AI-driven device testing for vibe coding

Project description

auto-simctl

Intelligent Mobile Simulator Control — the missing piece for vibe coding on mobile. An AI agent that controls real/simulated Android and iOS devices: screenshot → UI understanding → reasoning → action → report.

What it does

  • Unified device bridge (MDB): One API over adb (Android) and idb (iOS Simulator). All coordinates clamped to screen bounds before execution.
  • Accessibility-first UI understanding: idb ui describe-all provides precise logical-point (cx, cy) for every element. Qwen picks which element; the element table supplies where.
  • Qwen-as-director: Qwen3.5-9B reasons from the accessibility tree + screenshot, decides actions, and bridges language semantics (e.g. Chinese task → English UI labels) without hardcoded translation tables.
  • Fast-paths (no LLM): The orchestrator short-circuits Qwen for deterministic cases — gesture keywords ("swipe right", "back", …), open-keyboard input_text, app-icon visible, foreground matches target app.
  • act — stateful one-shot mode: Does NOT reset to HOME. Executes exactly one atomic action and returns. tap / swipe / scroll / pan are always considered done after one execution. Designed for the MCP vibe-coding loop.
  • run — full autonomous mode: Pre-flight HOME reset, then multi-step ReAct loop until the goal is complete or max steps reached.
  • screen — instant screen snapshot: Returns foreground app, visible elements with tap coordinates, scroll state, and keyboard state. With -s saves a screenshot.
  • Keyboard-open fast-path: When act("input_text X") is called and the keyboard is already visible, the text is typed immediately — no Qwen call needed.
  • Compound input: act("type https://… on address textfield") taps the field, waits for the keyboard, then types — all in one call.
  • Pre-flight home reset (run only): Before each task, presses HOME then corrects Today View / side launcher pages by swiping left to page 0.
  • UI-UG fallback: UI-UG-7B-2601 via a background HTTP server handles custom-drawn views where accessibility labels are unavailable (games, canvas, WebView).
  • ReAct loop with navigation stack: Screenshot → acc elements → fast-paths → Qwen → action → execute → update nav stack → repeat until done or max steps.
  • Navigation & scroll awareness: Maintains a NavFrame stack (depth, screen label, scroll offset) so the agent always knows which page it's on and how far it has scrolled.
  • Dialog & keyboard handling: Auto-detects and dismisses system permission dialogs; detects on-screen keyboard and switches to input_text() automatically.
  • MCP server: mcp_server/server.py exposes list_devices, get_screen_state, act, and run_task as FastMCP tools — plug directly into Cursor or Claude Desktop.

Quick start

# One-time setup: install adb, idb, Python deps, download models
./setup.sh

# Start both model servers (Qwen on :8080, UI-UG on :8081)
python3 cli.py server start

# ── Full autonomous task (HOME reset, multi-step) ─────────────────────────────
python3 cli.py run "Open Settings"
python3 cli.py run "find any folders" --verbose

# ── One-shot act (no HOME reset, continues from current screen) ───────────────
python3 cli.py act "swipe right"                                # gesture fast-path
python3 cli.py act "tap Watch app"                              # tap
python3 cli.py act "tap address bar"                            # tap a field (keyboard opens)
python3 cli.py act "input_text https://google.com"              # type (keyboard must be open)
python3 cli.py act "press enter"                                # submit — input_text does NOT auto-press Enter
python3 cli.py act "type https://google.com on address bar"     # tap + type in one call (still needs press enter after)
python3 cli.py act "back"                                       # back

# ── Screen snapshot (for the vibe-coding brain) ───────────────────────────────
python3 cli.py screen                  # rich text summary of current screen
python3 cli.py screen --json           # machine-readable JSON
python3 cli.py screen -s shot.png      # save screenshot file

# List connected devices
python3 cli.py devices

# Stop servers
python3 cli.py server stop

Requirements

  • macOS (Apple Silicon) — MLX-based models
  • Python 3.10+
  • Xcode + idb-companion (Homebrew) for iOS
  • android-platform-tools (Homebrew) or Android Studio for Android
  • Models in ~/.cache/huggingface/hub/:
    • qwen3.5-9b-mlx-4bit (reasoning, ~8s/step with thinking)
    • neovateai/UI-UG-7B-2601 (UI grounding fallback)

How the agent thinks

run mode — pre-flight (once per task):
  1. Press HOME if not on SpringBoard
  2. Detect Today View (>12 elements) → swipe left to page 0

act mode — no pre-flight (continues from current screen)

For each step (max 20):
  1. Take screenshot
  2. List all accessibility elements (visible + off-screen)
  3. Detect keyboard open (single-letter Button elements present)
  4. Get scroll boundary info (content above/below/left/right)
  5. Detect & auto-dismiss system dialogs
  6. Fast-paths (no Qwen):
     a. Keyboard open + input task (input_text / type)
        → input_text(X) directly, skip Qwen
     b. Gesture keyword (swipe right, swipe left, back, scroll up, scroll down, …)
        → deterministic swipe/press, skip Qwen  [act only: also skips step loop]
     c. Elements show Tab Bar + No Recents + app label → done
     d. MDB foreground = target app + elements show in-app → done
     e. App icon Button visible in elements → direct tap(cx, cy)
  7. Qwen phase-1: analyze screenshot + elements → decide action
     └─ If action = "ground": accessibility elements passed to Qwen phase-2
        (UI-UG called only if no accessibility labels at all)
  8. Snap out-of-bounds tap to nearest accessible element
  9. Execute action via MDB (coords clamped to screen bounds)
  10. act mode one-shot rules:
      - tap / swipe / scroll / pan → return done immediately (no verify loop)
      - input task + step 1 was tap → sleep 0.8s for keyboard, type, return done
  11. Update navigation stack (NavFrame + ScrollState)
  12. Dead-end check: same action 3× → force HOME  [run mode only]

Key design principles:

  • Accessibility elements carry (cx, cy) in logical points — Qwen decides which element, the element table provides where. Qwen never needs to estimate coordinates from the screenshot for standard iOS apps.
  • act is strictly one-shot: tap / swipe / scroll / pan complete after one execution. No post-action verification — the MCP caller observes via get_screen_state.
  • Keyboard detection drives the input_text fast-path: if a keyboard is visible when input_text X is issued, the text is typed immediately without any Qwen call.
  • input_text does not press Enter/Return automatically. If the action requires submission (URL navigation, search, form submit), follow up with a separate act("press enter").

Project layout

auto-simctl/
├── cli.py                  # Entry point: run / act / screen / devices / server
├── ui_server.py            # UI-UG-7B HTTP server (port 8081)
├── logger.py               # Structured logging helpers
│
├── mdb/                    # Mobile Device Bridge
│   ├── bridge.py           # DeviceBridge unified API + coord clamping
│   ├── screen.py           # ScreenSpec: pixel ↔ logical-pt ↔ norm1000
│   ├── models.py           # DeviceInfo, Action, Screenshot dataclasses
│   └── backends/
│       ├── idb_backend.py  # iOS: screenshot, tap, swipe, input_text,
│       │                   #   list_elements, get_foreground_app,
│       │                   #   get_scroll_info, detect_system_dialog
│       └── adb_backend.py  # Android: same interface via adb
│
├── agents/
│   ├── qwen_agent.py       # Qwen3.5-9B via mlx-openai-server; adaptive thinking
│   ├── ui_agent.py         # UI-UG-7B-2601 client (HTTP → port 8081)
│   └── prompts.py          # SYSTEM_PROMPT + build_user_message
│
├── orchestrator/
│   ├── loop.py             # Pre-flight, fast-paths, ReAct loop, nav stack,
│   │                       #   act one-shot rules, keyboard/input fast-path
│   └── result.py           # TaskResult, StepLog, NavFrame, ScrollState
│
├── mcp_server/
│   └── server.py           # FastMCP server: list_devices, get_screen_state,
│                           #   act (one-shot), run_task (multi-step)
│
├── PLAN.md                 # Full architecture and design decisions
├── setup.sh                # Auto-installer
└── .cursor/skills/
    └── auto-simctl-navigation/SKILL.md   # Navigation patterns & failure modes

See PLAN.md for full architecture, coordinate systems, and design decisions.

Third-party Models

This project downloads and uses the following models at runtime (not bundled):

Model License Source
qwen3.5-9b-mlx-4bit Apache 2.0 Qwen / Alibaba Cloud
neovateai/UI-UG-7B-2601 Apache 2.0 neovateai/UI-UG-7B-2601

Models are downloaded separately (via setup.sh) and are not redistributed with this project.

License

MIT — auto-simctl source code only. The downloaded models are governed by their respective Apache 2.0 licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_simctl-0.1.2.tar.gz (67.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_simctl-0.1.2-py3-none-any.whl (71.1 kB view details)

Uploaded Python 3

File details

Details for the file auto_simctl-0.1.2.tar.gz.

File metadata

  • Download URL: auto_simctl-0.1.2.tar.gz
  • Upload date:
  • Size: 67.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for auto_simctl-0.1.2.tar.gz
Algorithm Hash digest
SHA256 33cf3e068ce693aa0541f37932fddef830b34e188a8ab6f44b4e9523a8181338
MD5 bab5cb10e8ea073ad3d56417885b046d
BLAKE2b-256 ae3b51e9810781f203fc4c7dd9fdd3c244c7c7b66c36e4001181d3d4576c9b48

See more details on using hashes here.

File details

Details for the file auto_simctl-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: auto_simctl-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 71.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for auto_simctl-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 467be75eca960684416c2d18a4f1dd18961370ad53971879b4bf43aaa23c23d0
MD5 194524e6a4b68d67412d3f33bfb13c52
BLAKE2b-256 b736d7d3d5257a47ba01305f3d088648d7fca9739b394298a8df367a1fdf9011

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page