Skip to main content

Parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.

Project description

ui-tars

A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.


Introduction

ui-tars is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.

  • Supports multiple VLM output formats (e.g., Qwen-VL, Seed-VL)
  • Automatically handles coordinate scaling and format conversion
  • One-click generation of pyautogui automation scripts

Quick Start

Installation

pip install ui-tars
# or
uv pip install ui-tars

Parse output into structured actions

from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code

response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
original_image_width, original_image_height = 1920, 1080
parsed_dict = parse_action_to_structure_output(
    response,
    factor=1000,
    origin_resized_height=original_image_height,
    origin_resized_width=original_image_width,
    model_type="doubao"
)
print(parsed_dict)
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
    responses=parsed_dict,
    image_height=original_image_height,
    image_width=original_image_width
)
print(parsed_pyautogui_code)

Generate pyautogui automation script

from ui_tars.action_parser import parsing_response_to_pyautogui_code

pyautogui_code = parsing_response_to_pyautogui_code(parsed_dict, original_image_height, original_image_width)
print(pyautogui_code)

Visualize coordinates on the image (optional)

from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt

image = Image.open("your_image_path.png")
start_box = parsed_dict[0]["action_inputs"]["start_box"]
coordinates = eval(start_box)
x1 = int(coordinates[0] * original_image_width)
y1 = int(coordinates[1] * original_image_height)
draw = ImageDraw.Draw(image)
radius = 5
draw.ellipse((x1 - radius, y1 - radius, x1 + radius, y1 + radius), fill="red", outline="red")
plt.imshow(np.array(image))
plt.axis("off")
plt.show()

API Documentation

parse_action_to_structure_output

def parse_action_to_structure_output(
    text: str,
    factor: int,
    origin_resized_height: int,
    origin_resized_width: int,
    model_type: str = "qwen25vl",
    max_pixels: int = 16384 * 28 * 28,
    min_pixels: int = 100 * 28 * 28
) -> list[dict]:
    ...

Description: Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.

Parameters:

  • text: The output string
  • factor: Scaling factor
  • origin_resized_height/origin_resized_width: Original image height/width
  • model_type: Model type (e.g., "qwen25vl", "doubao")
  • max_pixels/min_pixels: Image pixel upper/lower limits

Returns: A list of structured actions, each as a dict with fields like action_type, action_inputs, thought, etc.


parsing_response_to_pyautogui_code

def parsing_response_to_pyautogui_code(
    responses: dict | list[dict],
    image_height: int,
    image_width: int,
    input_swap: bool = True
) -> str:
    ...

Description: Converts structured actions into a pyautogui script string, supporting click, type, hotkey, drag, scroll, and more.

Parameters:

  • responses: Structured actions (dict or list of dicts)
  • image_height/image_width: Image height/width
  • input_swap: Whether to use clipboard paste for typing (default True)

Returns: A pyautogui script string, ready for automation execution.


Contribution

Contributions, issues, and suggestions are welcome!


License

Apache-2.0 License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ui_tars-0.4.4.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ui_tars-0.4.4-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file ui_tars-0.4.4.tar.gz.

File metadata

  • Download URL: ui_tars-0.4.4.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ui_tars-0.4.4.tar.gz
Algorithm Hash digest
SHA256 fda688550e94a579e335d3a80b60a65d0b9381bd4bc1c1ef67f8a266cfb3a1c3
MD5 96a87b830dc5ec88cafc8a456adf3f48
BLAKE2b-256 c07cd5b1b1003f62a848b74b3e0c2f22de2b970ac9c68e724dd86c3e0e2d7ebe

See more details on using hashes here.

File details

Details for the file ui_tars-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: ui_tars-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ui_tars-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 250679c6254189a90887ce5fc6cfb8f5472a7fb6390ad3393ed1fd12147c9d31
MD5 0f95497cf541dbe89098a4e9f76d6e0a
BLAKE2b-256 7a7b3402f1c6e63303b3a3ea6bf4aa5e8f48ae6047421825cefc8dbb21b598a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page