Mobile testing MCP server: OmniParser UI element detection + direct device control (mobilecli + WDA)
Project description
mobile-parser
An MCP server for mobile app testing that combines OmniParser vision-based UI element detection with direct device control.
Unlike accessibility-tree-based tools, OmniParser detects UI elements directly from screenshots — making it work reliably with Flutter, WebView, games, and any app regardless of the UI framework.
Features
- Vision-based element detection — OmniParser (YOLO + Florence-2 + EasyOCR) finds UI elements from screenshots
- Cross-platform — iOS Simulator and Android (emulator + real device)
- Zero-config coordinates —
find_elementsreturns tap-ready coordinates; pass them directly totap() - No Appium required — talks directly to WDA (iOS) and adb (Android)
- Auto-download everything — models, tools, and dependencies fetched on first use
Installation
Claude Code
claude mcp add mobile-parser -- uvx mobile-parser
Claude Desktop / Cursor / Other MCP Clients
Add to your MCP config JSON:
{
"mcpServers": {
"mobile-parser": {
"command": "uvx",
"args": ["mobile-parser"]
}
}
}
Prerequisites
- Python 3.10+ (managed by uv automatically)
- Node.js / npm (for mobilecli — auto-downloaded via npx)
iOS
- Xcode + iOS Simulator
- WebDriverAgent installed on the simulator
Android
- Android SDK (
adbin PATH orANDROID_HOMEset) - Emulator or device connected via
adb
What gets auto-downloaded
| Component | When | Size |
|---|---|---|
| Python packages (torch, etc.) | First uvx mobile-parser run |
~2 GB |
| mobilecli binary | First device operation | ~20 MB |
| OmniParser models | First mobile_find_elements call |
~1.5 GB |
| Florence-2 processor | First icon captioning | ~500 MB |
Usage
1. mobile_find_elements(device="...") → elements with tap coordinates
2. mobile_tap(device="...", x=tap_x, y=tap_y) → tap the element
mobile_find_elements handles the full pipeline:
- Takes a screenshot of the device
- Runs OmniParser to detect all UI elements (text + icons)
- Converts pixel coordinates to logical screen coordinates
The returned tap_x / tap_y can be passed directly to mobile_tap().
Example prompts
- "Find and tap the Login button"
- "Scroll down and look for a search bar"
- "Launch the Settings app and navigate to Wi-Fi"
- "Take a screenshot and describe what's on screen"
Tools
Screen Analysis (OmniParser)
| Tool | Description |
|---|---|
mobile_find_elements |
Primary tool — screenshot → OmniParser → tap coordinates |
mobile_screenshot |
Take a screenshot (resized for LLM, max 1568px) |
mobile_save_screenshot |
Save screenshot to file |
mobile_parse_image |
Parse an existing image file |
Interaction
| Tool | Description |
|---|---|
mobile_tap |
Tap at coordinates |
mobile_double_tap |
Double-tap at coordinates |
mobile_long_press |
Long press at coordinates |
mobile_swipe |
Swipe in a direction (up / down / left / right) |
mobile_type_text |
Type text into the focused element |
mobile_press_button |
Press a hardware button (home / back / etc.) |
Device Management
| Tool | Description |
|---|---|
mobile_list_devices |
List available devices and simulators |
mobile_get_screen_size |
Get device screen size |
mobile_list_apps |
List installed apps |
mobile_launch_app |
Launch an app by bundle ID |
mobile_terminate_app |
Terminate a running app |
mobile_open_url |
Open a URL in the default browser |
Architecture
No dependency on mobile-mcp server. Directly controls devices via platform-native APIs:
| Platform | Device Discovery | Interactions | Screenshots | App Management |
|---|---|---|---|---|
| iOS | mobilecli (npx) | WebDriverAgent HTTP API | WDA /screenshot |
xcrun simctl |
| Android | mobilecli (npx) | adb shell input |
adb exec-out screencap |
adb shell am/pm |
mobile-parser (MCP Server)
├── server.py → FastMCP server with 16 tools
├── coordinator.py → Screenshot → OmniParser → coordinate conversion
├── mobile_client.py → Device control (iOS: WDA, Android: adb)
├── mobilecli.py → mobilecli wrapper (npx auto-download)
├── wda.py → WebDriverAgent HTTP client
└── parser.py → OmniParser (YOLO + Florence-2 + EasyOCR)
Configuration
| Environment Variable | Description | Default |
|---|---|---|
OMNIPARSER_WEIGHTS_DIR |
Model weights directory | ~/.cache/omniparser |
OMNIPARSER_DEVICE |
Inference device (cuda / mps / cpu) |
Auto-detect |
MOBILECLI_PATH |
mobilecli binary path | npx auto-download |
License
AGPL-3.0 — due to the ultralytics (YOLOv8) dependency.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mobile_parser-0.2.0.tar.gz.
File metadata
- Download URL: mobile_parser-0.2.0.tar.gz
- Upload date:
- Size: 324.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52d414b32c1a7db3ef07bff57cac67cca9c769d5e03a2c7bcb4f9f12fe00f961
|
|
| MD5 |
daeb8e41ad869c43c4d7881f620be55c
|
|
| BLAKE2b-256 |
150a0db7b3810a31890bec7d218ef314e810b29c3a37a6778ebb1c10399a734a
|
File details
Details for the file mobile_parser-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mobile_parser-0.2.0-py3-none-any.whl
- Upload date:
- Size: 34.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81b4c77af5af7a0af300ab778ee94b428bb93701086a0f03a287eda5deb92841
|
|
| MD5 |
bc61c6d4d49d8e866c073c89bf981bc2
|
|
| BLAKE2b-256 |
486668fa361f3b5825368cc3e6e261fad9cf4dcf430389ef49b4e9d9da8267fc
|