Skip to main content

Programmatically control Android like a Human

Project description

humdroid

Programmatically control Android like a Human using computer vision.

This project was meant to automate certain tasks in games, such as Battlecats, as easily as possible. By using images to scan and detect UI, writing scripts becomes much easier as only images needs to be added, not hardcoded positions.

Uses a three-step process:

  • py-scrcpy-client to very quickly capture the screen of the device. Has the option for image compression of the screen to reduce the search area.
    • adb + ffmpeg can also be used and is an option in adbAPI.bash. It is much faster than the usual adb shell screencap as PNG conversion is done at your computer's processor, not the device's.
  • OpenCV (CUDA + OpenGL optional) to find specific items based on template matching.
  • py-scrcpy-client to very quickly click and swipe the screen.
    • The alternative to this is to use adb shell input [command] to send touch commands. This approach is much, much slower, but it can be done. From experience, adb shell input takes around 1s at best, while py-scrcpy-client takes around 50ms at worst.
    • Another alternative is monkeyrunner. This is even faster than py-scrcpy-client due to being built on Java. However, the tool is now outdated and requires a full installation of the Android SDK, which is extremely difficult to run on embedded devices.

Program Structure

Python's OpenCV is about as fast as C++'s OpenCV speed due to using C++ under the hood, but humdroid uses C++ for it. Why? Honestly I didn't think it would perform that fast, so I made the foolish error of overcomplicating this entire program. Either way, the C++ server should be faster than the python equivalent as it uses loops to filter out results, which is something python is not that fast at in the long run. Nonetheless, the current structure of the program revolves around IPC communication between the C++ server and the script using TCP sockets and JSON messages.

Localhost Port 6069 is reserved for receiving messages from the python script and Localhost Port 6070 for output from the C++ server. Due to C sockets being incredibly difficult to work with, the C++ server essentially crashes when a client disconnects, and must be restarted. In addition, every JSON message is delimited by $ in the case where multiple messages arrive at the same time.

Therefore:

  • opencv - C++ OpenCV server
  • humdroid - Python library for interacting with scrcpy via py-scrcpy-client and the opencv server.

Compiling / Running

To run this project, you'll need to compile the C++ server first. The dependencies for this are:

  • OpenCV: sudo apt install libopencv-dev
    • CUDA compatibility must be compiled into OpenCV in order to use it.
  • sudo apt install adb
  • pip3 install scrcpy-client adbutils
    • sudo apt install ffmmpeg if using alternative method for screenshots

After that, run git submodule update --init.

In opencv, run bash compile.bash to compile the server, humdroid_cpu. If you have CUDA installed and your version of OpenCV supports it, then humdroid_gpu will be built.

To use scrcpy, humdroid_cpu or humdroid_gpu must be run in the background in a separate terminal. If you want to use the included CVServer to start the server, either server has to be in PATH. This can be done easily by running install.bash in opencv.

Server API - Input Port 6069:

Any messages sent to the server have to be sent separately, and they cannot be combined in a single message.

Load A Template

Loads a single template image into the server. ID is an integer used to mark the image easily without comparing by string, and group is used to categorize the template image. path has to be an absolute path to the image.

{
    "loadTemplate" : {
        "path" : "/home/user/template.png",
        "id" : 3,
        "group": 0
  }
}

Compare By ID

Uses the template marked with the specific ID and tries to see if it is somewhere in the image provided. If multiple templates have the same ID, they will all be matched. photo is the absolute path to the photo and id is the id. minConfidence is a double from 0.0 - 1.00 describing the minimum amount of confidence template matching has to have. For example, a minConfidence of 0.95 means that the algorithm must be 95% sure the template is where they say it is.

{
    "compareID" : {
        "photo" : "/home/user/photo.png",
        "minConfidence" : 0.95,
        "id" : 3
    }
}

Compare by Group

Tries to see if any of the templates in a group are somewhere in the image provided. photo is the absolute path to the photo and group is the group number to check. minConfidence is a double from 0.0 - 1.00 describing the minimum amount of confidence template matching has to have. For example, a minConfidence of 0.95 means that the algorithm must be 95% sure the templates are where they say it is.

{
    "compareGroup" : {
        "photo" : "/home/user/photo.png",
        "minConfidence" : 0.95,
        "group" : 0
    }
}

Server API - Input Port 6070:

Matches

This message is sent out every time a compare message was sent. It is guaranteed to arrive in the order the compare messages were sent in. id is the ID matched, x is the x-coordinate from the topleft of the photo compared, y is the y-coordinate from the topleft of the photo compared (positive goes downwards), confidence is the confidence of the algorithm of it being in that specific spot, and origin describes where x and y are compared to the template. By default, origin will always be "center", though it could be "topleft" as well.

It is important to mention that only one match will be returned. If the template is in multiple spots of the image, it is up to OpenCV to decide what specific match gets returned.

{
    "matches" : [
        {
            "id" : 3,
            "x" : 370,
            "y" : 640,
            "confidence" : 0.966553
            "origin" : "center"
        }
    ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humdroid-0.2.1.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

humdroid-0.2.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file humdroid-0.2.1.tar.gz.

File metadata

  • Download URL: humdroid-0.2.1.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for humdroid-0.2.1.tar.gz
Algorithm Hash digest
SHA256 cc534770da425520b37a46f3c891d3400f155ba1b9b73f1c2064eecc96a34af7
MD5 3d3463f87c2dd5181ad1125f4bd70605
BLAKE2b-256 cf7e3503a14762efe9276ae670ef3f28a25adb0d7165794277250333c71b92f8

See more details on using hashes here.

File details

Details for the file humdroid-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: humdroid-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for humdroid-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3d541f58c658e7865db49957267307c3409397f2fc4c46034569500cf0ab5a4e
MD5 5e64b893f41ce5cb22a7be2283f72130
BLAKE2b-256 fb96af26be997eb3dab165f50d3349dfe94838e4572a6d050ffa876aae7c6a42

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page