Skip to main content

Mobile testing MCP server: OmniParser UI element detection + direct device control (mobilecli + WDA)

Project description

mobile-parser

PyPI version Python License: AGPL-3.0

An MCP server for mobile app testing that combines OmniParser vision-based UI element detection with direct device control.

Unlike accessibility-tree-based tools, OmniParser detects UI elements directly from screenshots — making it work reliably with Flutter, WebView, games, and any app regardless of the UI framework.

Features

  • Vision-based element detection — OmniParser (YOLO + Florence-2 + EasyOCR) finds UI elements from screenshots
  • Cross-platform — iOS Simulator and Android (emulator + real device)
  • Zero-config coordinatesfind_elements returns tap-ready coordinates; pass them directly to tap()
  • No Appium required — talks directly to WDA (iOS) and adb (Android)
  • Auto-download everything — models, tools, and dependencies fetched on first use

Installation

Claude Code

claude mcp add mobile-parser -- uvx mobile-parser
Claude Desktop / Cursor / Other MCP Clients

Add to your MCP config JSON:

{
  "mcpServers": {
    "mobile-parser": {
      "command": "uvx",
      "args": ["mobile-parser"]
    }
  }
}

Prerequisites

  • Python 3.10+ (managed by uv automatically)
  • Node.js / npm (for mobilecli — auto-downloaded via npx)
iOS
Android
  • Android SDK (adb in PATH or ANDROID_HOME set)
  • Emulator or device connected via adb

What gets auto-downloaded

Component When Size
Python packages (torch, etc.) First uvx mobile-parser run ~2 GB
mobilecli binary First device operation ~20 MB
OmniParser models First mobile_find_elements call ~1.5 GB
Florence-2 processor First icon captioning ~500 MB

Usage

1. mobile_find_elements(device="...") → elements with tap coordinates
2. mobile_tap(device="...", x=tap_x, y=tap_y) → tap the element

mobile_find_elements handles the full pipeline:

  1. Takes a screenshot of the device
  2. Runs OmniParser to detect all UI elements (text + icons)
  3. Converts pixel coordinates to logical screen coordinates

The returned tap_x / tap_y can be passed directly to mobile_tap().

Example prompts

  • "Find and tap the Login button"
  • "Scroll down and look for a search bar"
  • "Launch the Settings app and navigate to Wi-Fi"
  • "Take a screenshot and describe what's on screen"

Tools

Screen Analysis (OmniParser)
Tool Description
mobile_find_elements Primary tool — screenshot → OmniParser → tap coordinates
mobile_screenshot Take a screenshot (resized for LLM, max 1568px)
mobile_save_screenshot Save screenshot to file
mobile_parse_image Parse an existing image file
Interaction
Tool Description
mobile_tap Tap at coordinates
mobile_double_tap Double-tap at coordinates
mobile_long_press Long press at coordinates
mobile_swipe Swipe in a direction (up / down / left / right)
mobile_type_text Type text into the focused element
mobile_press_button Press a hardware button (home / back / etc.)
Device Management
Tool Description
mobile_list_devices List available devices and simulators
mobile_get_screen_size Get device screen size
mobile_list_apps List installed apps
mobile_launch_app Launch an app by bundle ID
mobile_terminate_app Terminate a running app
mobile_open_url Open a URL in the default browser

Architecture

No dependency on mobile-mcp server. Directly controls devices via platform-native APIs:

Platform Device Discovery Interactions Screenshots App Management
iOS mobilecli (npx) WebDriverAgent HTTP API WDA /screenshot xcrun simctl
Android mobilecli (npx) adb shell input adb exec-out screencap adb shell am/pm
mobile-parser (MCP Server)
├── server.py          → FastMCP server with 16 tools
├── coordinator.py     → Screenshot → OmniParser → coordinate conversion
├── mobile_client.py   → Device control (iOS: WDA, Android: adb)
├── mobilecli.py       → mobilecli wrapper (npx auto-download)
├── wda.py             → WebDriverAgent HTTP client
└── parser.py          → OmniParser (YOLO + Florence-2 + EasyOCR)

Configuration

Environment Variable Description Default
OMNIPARSER_WEIGHTS_DIR Model weights directory ~/.cache/omniparser
OMNIPARSER_DEVICE Inference device (cuda / mps / cpu) Auto-detect
MOBILECLI_PATH mobilecli binary path npx auto-download

License

AGPL-3.0 — due to the ultralytics (YOLOv8) dependency.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobile_parser-0.2.0.tar.gz (324.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mobile_parser-0.2.0-py3-none-any.whl (34.9 kB view details)

Uploaded Python 3

File details

Details for the file mobile_parser-0.2.0.tar.gz.

File metadata

  • Download URL: mobile_parser-0.2.0.tar.gz
  • Upload date:
  • Size: 324.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for mobile_parser-0.2.0.tar.gz
Algorithm Hash digest
SHA256 52d414b32c1a7db3ef07bff57cac67cca9c769d5e03a2c7bcb4f9f12fe00f961
MD5 daeb8e41ad869c43c4d7881f620be55c
BLAKE2b-256 150a0db7b3810a31890bec7d218ef314e810b29c3a37a6778ebb1c10399a734a

See more details on using hashes here.

File details

Details for the file mobile_parser-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mobile_parser-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81b4c77af5af7a0af300ab778ee94b428bb93701086a0f03a287eda5deb92841
MD5 bc61c6d4d49d8e866c073c89bf981bc2
BLAKE2b-256 486668fa361f3b5825368cc3e6e261fad9cf4dcf430389ef49b4e9d9da8267fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page