Skip to main content

Mobile testing MCP server: OmniParser UI element detection + direct device control (mobilecli + WDA)

Project description

mobile-parser

PyPI version Python License: AGPL-3.0

An MCP server for mobile app testing that combines OmniParser vision-based UI element detection with direct device control.

Unlike accessibility-tree-based tools, OmniParser detects UI elements directly from screenshots — making it work reliably with Flutter, WebView, games, and any app regardless of the UI framework.

Features

  • Vision-based element detection — OmniParser (YOLO + Florence-2 + EasyOCR) finds UI elements from screenshots
  • Cross-platform — iOS Simulator and Android (emulator + real device)
  • Zero-config coordinatesfind_elements returns tap-ready coordinates; pass them directly to tap()
  • No Appium required — talks directly to WDA (iOS) and adb (Android)
  • Auto-download everything — models, tools, and dependencies fetched on first use

Installation

Claude Code

claude mcp add mobile-parser -- uvx mobile-parser
Claude Desktop / Cursor / Other MCP Clients

Add to your MCP config JSON:

{
  "mcpServers": {
    "mobile-parser": {
      "command": "uvx",
      "args": ["mobile-parser"]
    }
  }
}

Prerequisites

  • Python 3.10+ (managed by uv automatically)
  • Node.js / npm (for mobilecli — auto-downloaded via npx)
iOS
Android
  • Android SDK (adb in PATH or ANDROID_HOME set)
  • Emulator or device connected via adb

What gets auto-downloaded

Component When Size
Python packages (torch, etc.) First uvx mobile-parser run ~2 GB
mobilecli binary First device operation ~20 MB
OmniParser models First mobile_find_elements call ~1.5 GB
Florence-2 processor First icon captioning ~500 MB

Usage

1. mobile_find_elements(device="...") → elements with tap coordinates
2. mobile_tap(device="...", x=tap_x, y=tap_y) → tap the element

mobile_find_elements handles the full pipeline:

  1. Takes a screenshot of the device
  2. Runs OmniParser to detect all UI elements (text + icons)
  3. Converts pixel coordinates to logical screen coordinates

The returned tap_x / tap_y can be passed directly to mobile_tap().

Example prompts

  • "Find and tap the Login button"
  • "Scroll down and look for a search bar"
  • "Launch the Settings app and navigate to Wi-Fi"
  • "Take a screenshot and describe what's on screen"

Tools

Screen Analysis (OmniParser)
Tool Description
mobile_find_elements Primary tool — screenshot → OmniParser → tap coordinates
mobile_screenshot Take a screenshot (resized for LLM, max 1568px)
mobile_save_screenshot Save screenshot to file
mobile_parse_image Parse an existing image file
Interaction
Tool Description
mobile_tap Tap at coordinates
mobile_double_tap Double-tap at coordinates
mobile_long_press Long press at coordinates
mobile_swipe Swipe in a direction (up / down / left / right)
mobile_type_text Type text into the focused element
mobile_press_button Press a hardware button (home / back / etc.)
Device Management
Tool Description
mobile_list_devices List available devices and simulators
mobile_get_screen_size Get device screen size
mobile_list_apps List installed apps
mobile_launch_app Launch an app by bundle ID
mobile_terminate_app Terminate a running app
mobile_open_url Open a URL in the default browser

Architecture

No dependency on mobile-mcp server. Directly controls devices via platform-native APIs:

Platform Device Discovery Interactions Screenshots App Management
iOS mobilecli (npx) WebDriverAgent HTTP API WDA /screenshot xcrun simctl
Android mobilecli (npx) adb shell input adb exec-out screencap adb shell am/pm
mobile-parser (MCP Server)
├── server.py          → FastMCP server with 16 tools
├── coordinator.py     → Screenshot → OmniParser → coordinate conversion
├── mobile_client.py   → Device control (iOS: WDA, Android: adb)
├── mobilecli.py       → mobilecli wrapper (npx auto-download)
├── wda.py             → WebDriverAgent HTTP client
└── parser.py          → OmniParser (YOLO + Florence-2 + EasyOCR)

Configuration

Environment Variable Description Default
OMNIPARSER_WEIGHTS_DIR Model weights directory ~/.cache/omniparser
OMNIPARSER_DEVICE Inference device (cuda / mps / cpu) Auto-detect
MOBILECLI_PATH mobilecli binary path npx auto-download

License

AGPL-3.0 — due to the ultralytics (YOLOv8) dependency.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobile_parser-0.3.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mobile_parser-0.3.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file mobile_parser-0.3.0.tar.gz.

File metadata

  • Download URL: mobile_parser-0.3.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for mobile_parser-0.3.0.tar.gz
Algorithm Hash digest
SHA256 427ae8977ae247b3d08d42413e4f39aad9ef37db527bb4891bcb6b173334bece
MD5 a8161822a913fc4c27b06351a87b1c67
BLAKE2b-256 7628e46664f13bd97c3bdea75f77710f874ee1848e602d4c82b95d76bbc90d49

See more details on using hashes here.

File details

Details for the file mobile_parser-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mobile_parser-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for mobile_parser-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d00a1ffcdc8897687bfff035e8902b38fb3b08cc70b3c4b716949f38c6b4d75
MD5 e60a492d8402bf70cf8c4f067fa2a43f
BLAKE2b-256 f20b327b06d3172dff62859c768c487fc210cba210197ccde9a4093d27fb7f0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page