Parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
Project description
ui-tars
A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.
Introduction
ui-tars is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
- Supports multiple VLM output formats (e.g., Qwen-VL, Seed-VL)
- Automatically handles coordinate scaling and format conversion
- One-click generation of pyautogui automation scripts
Quick Start
Installation
pip install ui-tars
# or
uv pip install ui-tars
Parse output into structured actions
from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code
response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
original_image_width, original_image_height = 1920, 1080
parsed_dict = parse_action_to_structure_output(
response,
factor=1000,
origin_resized_height=original_image_height,
origin_resized_width=original_image_width,
model_type="doubao"
)
print(parsed_dict)
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
responses=parsed_dict,
image_height=original_image_height,
image_width=original_image_width
)
print(parsed_pyautogui_code)
Generate pyautogui automation script
from ui_tars.action_parser import parsing_response_to_pyautogui_code
pyautogui_code = parsing_response_to_pyautogui_code(parsed_dict, original_image_height, original_image_width)
print(pyautogui_code)
Visualize coordinates on the image (optional)
from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt
image = Image.open("your_image_path.png")
start_box = parsed_dict[0]["action_inputs"]["start_box"]
coordinates = eval(start_box)
x1 = int(coordinates[0] * original_image_width)
y1 = int(coordinates[1] * original_image_height)
draw = ImageDraw.Draw(image)
radius = 5
draw.ellipse((x1 - radius, y1 - radius, x1 + radius, y1 + radius), fill="red", outline="red")
plt.imshow(np.array(image))
plt.axis("off")
plt.show()
API Documentation
parse_action_to_structure_output
def parse_action_to_structure_output(
text: str,
factor: int,
origin_resized_height: int,
origin_resized_width: int,
model_type: str = "qwen25vl",
max_pixels: int = 16384 * 28 * 28,
min_pixels: int = 100 * 28 * 28
) -> list[dict]:
...
Description: Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.
Parameters:
text: The output stringfactor: Scaling factororigin_resized_height/origin_resized_width: Original image height/widthmodel_type: Model type (e.g., "qwen25vl", "doubao")max_pixels/min_pixels: Image pixel upper/lower limits
Returns:
A list of structured actions, each as a dict with fields like action_type, action_inputs, thought, etc.
parsing_response_to_pyautogui_code
def parsing_response_to_pyautogui_code(
responses: dict | list[dict],
image_height: int,
image_width: int,
input_swap: bool = True
) -> str:
...
Description: Converts structured actions into a pyautogui script string, supporting click, type, hotkey, drag, scroll, and more.
Parameters:
responses: Structured actions (dict or list of dicts)image_height/image_width: Image height/widthinput_swap: Whether to use clipboard paste for typing (default True)
Returns: A pyautogui script string, ready for automation execution.
Contribution
Contributions, issues, and suggestions are welcome!
License
Apache-2.0 License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ui_tars-0.4.6.3.tar.gz.
File metadata
- Download URL: ui_tars-0.4.6.3.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
667efba8a706ad1cde24a1988f6186bf97b12cd66ab2bf70f8a99414591b8c87
|
|
| MD5 |
e06abf07987020ff3f1e1110cb97026b
|
|
| BLAKE2b-256 |
d74304536c4d7940f6ca5d10d80a885d004c98c3ec631698a20f4289bef4f290
|
File details
Details for the file ui_tars-0.4.6.3-py3-none-any.whl.
File metadata
- Download URL: ui_tars-0.4.6.3-py3-none-any.whl
- Upload date:
- Size: 48.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e085058889587e2b5a3e10bd1e8bcbb79069e8be47394b2193418e5512d98dcc
|
|
| MD5 |
377b4edde69cf230d30067b117cf5d32
|
|
| BLAKE2b-256 |
5869e7c68154d2541baec26cc01c72f4d9d490d72b9877e041aea727e01a4a49
|