Mobile testing MCP server: OmniParser UI element detection + direct device control (mobilecli + WDA)

Project description

mobile-parser

An MCP server for mobile app testing that combines OmniParser vision-based UI element detection with direct device control.

Unlike accessibility-tree-based tools, OmniParser detects UI elements directly from screenshots — making it work reliably with Flutter, WebView, games, and any app regardless of the UI framework.

Features

Vision-based element detection — OmniParser (YOLO + Florence-2 + EasyOCR) finds UI elements from screenshots
Cross-platform — iOS Simulator and Android (emulator + real device)
Zero-config coordinates — find_elements returns tap-ready coordinates; pass them directly to tap()
No Appium required — talks directly to WDA (iOS) and adb (Android)
Auto-download everything — models, tools, and dependencies fetched on first use

Installation

Claude Code

claude mcp add mobile-parser -- uvx mobile-parser

Claude Desktop / Cursor / Other MCP Clients

Add to your MCP config JSON:

{
  "mcpServers": {
    "mobile-parser": {
      "command": "uvx",
      "args": ["mobile-parser"]
    }
  }
}

Prerequisites

Python 3.10+ (managed by uv automatically)
Node.js / npm (for mobilecli — auto-downloaded via npx)

iOS

Xcode + iOS Simulator
WebDriverAgent installed on the simulator
- See: Setup for iOS Simulator

Android

Android SDK (adb in PATH or ANDROID_HOME set)
Emulator or device connected via adb

What gets auto-downloaded

Component	When	Size
Python packages (torch, etc.)	First `uvx mobile-parser` run	~2 GB
mobilecli binary	First device operation	~20 MB
OmniParser models	First `mobile_find_elements` call	~1.5 GB
Florence-2 processor	First icon captioning	~500 MB

Usage

1. mobile_find_elements(device="...") → elements with tap coordinates
2. mobile_tap(device="...", x=tap_x, y=tap_y) → tap the element

mobile_find_elements handles the full pipeline:

Takes a screenshot of the device
Runs OmniParser to detect all UI elements (text + icons)
Converts pixel coordinates to logical screen coordinates

The returned tap_x / tap_y can be passed directly to mobile_tap().

Example prompts

"Find and tap the Login button"
"Scroll down and look for a search bar"
"Launch the Settings app and navigate to Wi-Fi"
"Take a screenshot and describe what's on screen"

Tools

Screen Analysis (OmniParser)

Tool	Description
`mobile_find_elements`	Primary tool — screenshot → OmniParser → tap coordinates
`mobile_screenshot`	Take a screenshot (resized for LLM, max 1568px)
`mobile_save_screenshot`	Save screenshot to file
`mobile_parse_image`	Parse an existing image file

Interaction

Tool	Description
`mobile_tap`	Tap at coordinates
`mobile_double_tap`	Double-tap at coordinates
`mobile_long_press`	Long press at coordinates
`mobile_swipe`	Swipe in a direction (up / down / left / right)
`mobile_type_text`	Type text into the focused element
`mobile_press_button`	Press a hardware button (home / back / etc.)

Device Management

Tool	Description
`mobile_list_devices`	List available devices and simulators
`mobile_get_screen_size`	Get device screen size
`mobile_list_apps`	List installed apps
`mobile_launch_app`	Launch an app by bundle ID
`mobile_terminate_app`	Terminate a running app
`mobile_open_url`	Open a URL in the default browser

Architecture

No dependency on mobile-mcp server. Directly controls devices via platform-native APIs:

Platform	Device Discovery	Interactions	Screenshots	App Management
iOS	mobilecli (npx)	WebDriverAgent HTTP API	WDA `/screenshot`	`xcrun simctl`
Android	mobilecli (npx)	`adb shell input`	`adb exec-out screencap`	`adb shell am/pm`

mobile-parser (MCP Server)
├── server.py          → FastMCP server with 16 tools
├── coordinator.py     → Screenshot → OmniParser → coordinate conversion
├── mobile_client.py   → Device control (iOS: WDA, Android: adb)
├── mobilecli.py       → mobilecli wrapper (npx auto-download)
├── wda.py             → WebDriverAgent HTTP client
└── parser.py          → OmniParser (YOLO + Florence-2 + EasyOCR)

Configuration

Environment Variable	Description	Default
`OMNIPARSER_WEIGHTS_DIR`	Model weights directory	`~/.cache/omniparser`
`OMNIPARSER_DEVICE`	Inference device (`cuda` / `mps` / `cpu`)	Auto-detect
`MOBILECLI_PATH`	mobilecli binary path	npx auto-download

License

AGPL-3.0 — due to the ultralytics (YOLOv8) dependency.

Project details

Release history Release notifications | RSS feed

This version

0.3.0

Apr 7, 2026

0.2.0

Mar 20, 2026

0.1.2

Mar 19, 2026

0.1.1 yanked

Mar 19, 2026

Reason this release was yanked:

License changed to AGPL-3.0, use 0.1.2+

0.1.0 yanked

Mar 19, 2026

Reason this release was yanked:

License changed to AGPL-3.0, use 0.1.2+

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobile_parser-0.3.0.tar.gz (38.5 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mobile_parser-0.3.0-py3-none-any.whl (35.3 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file mobile_parser-0.3.0.tar.gz.

File metadata

Download URL: mobile_parser-0.3.0.tar.gz
Upload date: Apr 7, 2026
Size: 38.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for mobile_parser-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`427ae8977ae247b3d08d42413e4f39aad9ef37db527bb4891bcb6b173334bece`
MD5	`a8161822a913fc4c27b06351a87b1c67`
BLAKE2b-256	`7628e46664f13bd97c3bdea75f77710f874ee1848e602d4c82b95d76bbc90d49`

See more details on using hashes here.

File details

Details for the file mobile_parser-0.3.0-py3-none-any.whl.

File metadata

Download URL: mobile_parser-0.3.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 35.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for mobile_parser-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d00a1ffcdc8897687bfff035e8902b38fb3b08cc70b3c4b716949f38c6b4d75`
MD5	`e60a492d8402bf70cf8c4f067fa2a43f`
BLAKE2b-256	`f20b327b06d3172dff62859c768c487fc210cba210197ccde9a4093d27fb7f0c`

See more details on using hashes here.

mobile-parser 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

mobile-parser

Features

Installation

Claude Code

Prerequisites

What gets auto-downloaded

Usage

Example prompts

Tools

Architecture

Configuration

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes