Parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.

Project description

ui-tars

A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.

Introduction

ui-tars is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.

Supports multiple VLM output formats (e.g., Qwen-VL, Seed-VL)
Automatically handles coordinate scaling and format conversion
One-click generation of pyautogui automation scripts

Quick Start

Installation

pip install ui-tars
# or
uv pip install ui-tars

Parse output into structured actions

from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code

response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
original_image_width, original_image_height = 1920, 1080
parsed_dict = parse_action_to_structure_output(
    response,
    factor=1000,
    origin_resized_height=original_image_height,
    origin_resized_width=original_image_width,
    model_type="doubao"
)
print(parsed_dict)
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
    responses=parsed_dict,
    image_height=original_image_height,
    image_width=original_image_width
)
print(parsed_pyautogui_code)

Generate pyautogui automation script

from ui_tars.action_parser import parsing_response_to_pyautogui_code

pyautogui_code = parsing_response_to_pyautogui_code(parsed_dict, original_image_height, original_image_width)
print(pyautogui_code)

Visualize coordinates on the image (optional)

from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt

image = Image.open("your_image_path.png")
start_box = parsed_dict[0]["action_inputs"]["start_box"]
coordinates = eval(start_box)
x1 = int(coordinates[0] * original_image_width)
y1 = int(coordinates[1] * original_image_height)
draw = ImageDraw.Draw(image)
radius = 5
draw.ellipse((x1 - radius, y1 - radius, x1 + radius, y1 + radius), fill="red", outline="red")
plt.imshow(np.array(image))
plt.axis("off")
plt.show()

API Documentation

parse_action_to_structure_output

def parse_action_to_structure_output(
    text: str,
    factor: int,
    origin_resized_height: int,
    origin_resized_width: int,
    model_type: str = "qwen25vl",
    max_pixels: int = 16384 * 28 * 28,
    min_pixels: int = 100 * 28 * 28
) -> list[dict]:
    ...

Description: Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.

Parameters:

text: The output string
factor: Scaling factor
origin_resized_height/origin_resized_width: Original image height/width
model_type: Model type (e.g., "qwen25vl", "doubao")
max_pixels/min_pixels: Image pixel upper/lower limits

Returns: A list of structured actions, each as a dict with fields like action_type, action_inputs, thought, etc.

parsing_response_to_pyautogui_code

def parsing_response_to_pyautogui_code(
    responses: dict | list[dict],
    image_height: int,
    image_width: int,
    input_swap: bool = True
) -> str:
    ...

Description: Converts structured actions into a pyautogui script string, supporting click, type, hotkey, drag, scroll, and more.

Parameters:

responses: Structured actions (dict or list of dicts)
image_height/image_width: Image height/width
input_swap: Whether to use clipboard paste for typing (default True)

Returns: A pyautogui script string, ready for automation execution.

Contribution

Contributions, issues, and suggestions are welcome!

License

Apache-2.0 License

Project details

Release history Release notifications | RSS feed

0.5.1

Apr 9, 2026

0.5.0

Mar 28, 2026

0.4.9

Feb 25, 2026

0.4.8.2

Jan 27, 2026

0.4.8.1

Jan 27, 2026

0.4.8

Jan 26, 2026

0.4.7

Jan 25, 2026

0.4.6.3

Jan 17, 2026

0.4.6.2

Jan 15, 2026

0.4.6.1

Jan 6, 2026

0.4.6

Dec 26, 2025

0.4.5.2

Dec 26, 2025

0.4.5.1

Dec 26, 2025

0.4.5

Dec 26, 2025

0.4.4

Dec 16, 2025

0.4.3

Dec 16, 2025

0.4.2.2

Dec 11, 2025

0.4.2.1

Dec 4, 2025

0.4.2

Dec 4, 2025

0.4.1

Dec 2, 2025

0.4.0

Nov 24, 2025

0.3.10

Nov 18, 2025

0.3.9

Nov 10, 2025

0.3.8.1

Nov 5, 2025

0.3.8

Nov 3, 2025

This version

0.3.7

Oct 31, 2025

0.3.6

Oct 28, 2025

0.3.4

Oct 18, 2025

0.3.3.1

Oct 15, 2025

0.3.3

Oct 15, 2025

0.3.2

Oct 14, 2025

0.3.1

Oct 10, 2025

0.3.0

Oct 10, 2025

0.2.9

Sep 29, 2025

0.2.8

Sep 29, 2025

0.2.7

Sep 28, 2025

0.2.6

Sep 25, 2025

0.2.5

Sep 25, 2025

0.2.4

Sep 25, 2025

0.2.3

Aug 25, 2025

0.2.2

Aug 22, 2025

0.2.1

Aug 19, 2025

0.1.4

May 21, 2025

0.1.3

May 21, 2025

0.1.2

May 21, 2025

0.1.1

May 21, 2025

0.1.0

May 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ui_tars-0.3.7.tar.gz (30.9 kB view details)

Uploaded Oct 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ui_tars-0.3.7-py3-none-any.whl (33.2 kB view details)

Uploaded Oct 31, 2025 Python 3

File details

Details for the file ui_tars-0.3.7.tar.gz.

File metadata

Download URL: ui_tars-0.3.7.tar.gz
Upload date: Oct 31, 2025
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ui_tars-0.3.7.tar.gz
Algorithm	Hash digest
SHA256	`02bb22ddba505563da4471661497b5644a7da12115ddf126344a9bbb55e80022`
MD5	`b159f8c172e267d96c638f8d869e6b7e`
BLAKE2b-256	`bafd5851f2301adb88f5d25d174433ce2c438c0927bd297e604f44d007a614db`

See more details on using hashes here.

File details

Details for the file ui_tars-0.3.7-py3-none-any.whl.

File metadata

Download URL: ui_tars-0.3.7-py3-none-any.whl
Upload date: Oct 31, 2025
Size: 33.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ui_tars-0.3.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e8a8d55aca501de5ae9f50caa153bea9054220b78b703f192a34e6a9fd55ba9`
MD5	`4dfac49c4de9550d3fcf03c3fe7a8068`
BLAKE2b-256	`938f6e7adf1a76dfe4aaf6c3519074056bfe97c50edb1b746f6ed64616d67ee6`

See more details on using hashes here.

ui-tars 0.3.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ui-tars

Introduction

Quick Start

Installation

Parse output into structured actions

Generate pyautogui automation script

Visualize coordinates on the image (optional)

API Documentation

parse_action_to_structure_output

parsing_response_to_pyautogui_code

Contribution

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes