Skip to main content

Android ADB automation CLI tool - Designed for AI Agents

Project description

agent-android

Android ADB automation CLI tool - Designed for AI Agents

License Python Platform

agent-android is an ADB-based Android device automation CLI tool that fully mirrors the design philosophy of agent-browser, providing powerful and simple Android automation capabilities for AI Agents.

✨ Features

  • 🤖 AI Friendly - Snapshot + Ref mode, designed specifically for AI
  • 🎯 Simple & Intuitive - Natural language descriptions, no programming knowledge needed
  • Fast & Efficient - Pure Python implementation, lightweight dependencies
  • 🔧 Flexible & Powerful - Dual interface: CLI + Python API
  • 🌍 Multi-language - Full Chinese natural language processing
  • 📦 Easy Integration - Simple API, easy to integrate into AI workflows
  • 🚀 Multi-device Support - Parallel operations on multiple Android devices
  • ⏱️ Smart Wait - Automatic element waiting, no manual sleep needed

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/your-username/agent-android.git
cd agent-android

# Install dependencies
pip install -r requirements.txt

# Ensure ADB is available
adb devices

Basic Usage

Linux / macOS:

# Connect to device
./agent-android connect

# Start app
./agent-android start_app com.example.app

# Get UI snapshot (with element refs)
./agent-android snapshot -i

# Operate on elements using refs
./agent-android tap @e1

# Take screenshot
./agent-android screenshot screen.png

# Disconnect
./agent-android disconnect

Windows:

REM Use batch wrapper
agent-android.bat connect
agent-android.bat start_app com.example.app
agent-android.bat snapshot -i
agent-android.bat tap @e1

REM Or use Python directly
python agent-android connect

More Operations

# Swipe screen
./agent-android swipe 100,200 300,400

# Input text
./agent-android input "Hello World"

# Press home key
./agent-android press home

# Start/stop app
./agent-android start_app com.example.app
./agent-android stop_app com.example.app

# Get element text
./agent-android get text @e1

# Export UI dump
./agent-android dump ui_dump.xml

📖 Core Features

1. Snapshot + Ref Mode (Core)

# Get UI snapshot, auto-generate element refs
./agent-android snapshot -i

# Output:
# - TextView "Learning" [id=com.app:id/tabTV] [ref=e1]
# - Button "Settings" [id=com.app:id/settings] [clickable] [ref=e2]
# - ImageView "Search" [id=com.app:id/search] [ref=e3]

# Use ref to tap
./agent-android tap @e2
./agent-android get text @e1

Advantages:

  • ✅ Deterministic - refs point to elements precisely
  • ✅ Fast - no need to reparse UI
  • ✅ AI friendly - snapshot provides complete context

2. Natural Language Control

from core.android import create_android_device
from core.nlp_icon_helper import NLPIconHelper

device = create_android_device()
nlp = NLPIconHelper(device)

# Use natural language to control
nlp.tap_by_nlp("Click settings button")
nlp.tap_by_nlp("Click menu icon in top right")
nlp.tap_by_nlp("Click learning tab at bottom")

device.close()

Supported Keywords:

  • Position: top-left, top-right, bottom, center, left, right
  • Type: icon, button, text, input

3. Python API

from core.android import create_android_device

device = create_android_device()

# App management
device.start_app("com.example.app")
device.stop_app("com.example.app")

# Touch operations
device.tap(500, 1000)
device.swipe(500, 1000, 500, 500)

# Input operations
device.input_text("Hello World")
device.press_home()
device.press_back()

# Element finding
element = device.find_element({
    "strategy": "text_contains",
    "value": "Settings"
})
device.tap(element['center']['x'], element['center']['y'])

# Screenshot
device.screenshot("screen.png")

device.close()

4. Multi-Device Management

# Connect all devices
./agent-android multi-connect

# List connected devices
./agent-android multi-list

# Parallel screenshot all devices
./agent-android multi-screenshot

# Parallel tap on all devices
./agent-android multi-tap 500 1000

# Parallel start app on all devices
./agent-android multi-start-app com.example.app

# Disconnect all devices
./agent-android multi-disconnect

5. Smart Wait

# Wait for element to appear
./agent-android wait-for id com.app:id/button

# Wait for text (10 second timeout)
./agent-android wait-for-text "Welcome" --timeout 10000

# Wait for app to start
./agent-android wait-for-app com.example.app

📚 Command Reference

Device Management

agent-android devices                     # List devices
agent-android connect [--serial <id>]     # Connect to device
agent-android disconnect                  # Disconnect

Touch Operations

agent-android tap <selector>              # Tap element ref or coordinates
agent-android swipe <start> <end>         # Swipe screen

Input Operations

agent-android input <text>                # Input text
agent-android press <key>                 # Press key (home/back/enter)

Screenshot & Snapshot

agent-android screenshot [path]           # Take screenshot
agent-android snapshot [-i]               # UI snapshot (interactive only)

App Management

agent-android start_app <package>         # Start app
agent-android stop_app <package>          # Stop app

Smart Wait

agent-android wait-for <strategy> <value> # Wait for element
agent-android wait-for-text <text>       # Wait for text
agent-android wait-for-app <package>      # Wait for app start

Multi-Device

agent-android multi-connect [--max <n>]   # Connect all devices
agent-android multi-list                  # List connected devices
agent-android multi-screenshot [path]     # Screenshot all devices
agent-android multi-tap <x> <y>           # Tap all devices
agent-android multi-start-app <package>   # Start app on all devices
agent-android multi-disconnect            # Disconnect all devices

Global Options

--session <name>                         # Session name
--json                                   # JSON output
--debug                                  # Debug mode

🎯 Use Cases

1. AI Agents Control Android Devices

# AI can use natural language directly
nlp.tap_by_nlp("Click settings button")
nlp.tap_by_nlp("Click back button")

2. UI Automation Testing

# Automated testing workflow
device.start_app("com.example.app")
time.sleep(2)

element = device.find_element({"strategy": "text", "value": "Login"})
device.tap_element(element)
device.input_text("test@example.com")
device.press_enter()

# Verify results
assert device.wait_for_text("Welcome", timeout=5000)

3. Data Collection

# Collect app data
device.start_app("com.example.app")

while True:
    # Take screenshot
    device.screenshot(f"data/screenshot_{int(time.time())}.png")

    # Get UI data
    ui_dump = device.get_ui_dump()

    # Check if done
    if device.find_element({"strategy": "text", "value": "Complete"}):
        break

4. Task Automation

# Automate repetitive tasks
device.start_app("com.example.app")

# Login
device.tap(500, 1000)
device.input_text("username")
device.tap(500, 1200)
device.input_text("password")
device.tap(500, 1400)

# Navigate through app
device.swipe(500, 2000, 500, 500)

5. Multi-Device Testing

from core.multi_device import create_multi_device_manager

manager = create_multi_device_manager()
manager.connect_all()

# Test app on all devices
manager.parallel_start_app("com.example.app")
manager.parallel_screenshot("test_{device_id}.png")
manager.parallel_tap(500, 1000)

manager.disconnect_all()

🔧 Requirements

  • Python 3.7+
  • ADB (Android Debug Bridge)
  • Android device or emulator

📄 License

Apache License 2.0 - see LICENSE file

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📞 Support

For issues and questions:

🎉 Summary

agent-android is a complete, production-ready Android automation tool designed for AI Agents.

Core Highlights

  1. Snapshot + Ref Mode - Matches agent-browser design
  2. Natural Language Control - Industry-first innovation
  3. Multi-Device Support - Parallel execution, 10x efficiency
  4. Smart Wait - Automatic element waiting
  5. Production Ready - All features tested
  6. Open Source - Apache 2.0 license
  7. Cross-Platform - Windows | Linux | macOS

Get started now:

git clone https://github.com/your-username/agent-android.git
cd agent-android
./agent-android connect

Made with ❤️ for AI Agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_android-1.1.0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_android-1.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_android-1.1.0.tar.gz.

File metadata

  • Download URL: agent_android-1.1.0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for agent_android-1.1.0.tar.gz
Algorithm Hash digest
SHA256 9aadc2a90f6886646c91a8ec1045baab074f5436b3aab4b59ac1a03dac3e22b1
MD5 2f87514e78742706189d818bd04368eb
BLAKE2b-256 ef97f3634718bbb471e1391b4aa82ed681726a828182c70b8479f64d7c2a5135

See more details on using hashes here.

File details

Details for the file agent_android-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: agent_android-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for agent_android-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 317e767dd12fabed0ee345835f179f45a5e0122763395b8e14dcfb14981fc554
MD5 1ad477a2e06974aebef62e723a83ef37
BLAKE2b-256 4207181b5e58e5a2ed6e23ef8d54d4337c5bbed13aac584b3960a6180a5ed04e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page