Liao (聊) - Vision-based GUI interaction assistant with LLM integration

These details have not been verified by PyPI

Project links

Project description

Liao

Vision-based GUI interaction assistant with LLM integration.

Liao is a Python application that automates desktop chat applications using OCR and large language models. It captures screen content, recognizes conversation text, generates contextual replies, and simulates user input to send messages automatically.

Supported Platforms: Windows, Linux (X11/Wayland)

Supported LLM Backends: Ollama (local), OpenAI API, Anthropic API, or any OpenAI-compatible endpoint

中文文档

Features

Vision-based Automation: Uses OCR to read chat messages and detect UI elements
Multiple LLM Backends: Supports Ollama for local inference, OpenAI, Anthropic, and compatible APIs
Bilingual Interface: English and Chinese GUI with runtime language switching
Chat App Detection: Auto-detects WeChat, QQ, Telegram, Slack, Discord, and other applications
Area Detection: Automatic and manual chat/input area detection with visual overlay
Reply Gate: Waits for the other party's reply before generating and sending new messages
Cross-platform: Full support on Windows; Linux support via xdotool and Wayland ScreenCast

Installation

Prerequisites

Python Version: 3.9 or higher

LLM Backend: At least one of the following:

Ollama running locally (recommended for privacy)
OpenAI API key
Anthropic API key
Any OpenAI-compatible API endpoint

Windows

# Install from PyPI
pip install liao

# Or install with OCR support (recommended)
pip install liao[ocr]

# Or install from source
git clone https://github.com/cycleuser/Liao.git
cd Liao
pip install -r requirements.txt

Linux

Linux requires additional system packages for input simulation and screenshot capture.

# Install system dependencies
sudo apt install xdotool wl-clipboard xclip tesseract-ocr tesseract-ocr-chi-sim gnome-screenshot

# Optional: For Wayland screenshot support (PyGObject method)
sudo apt install gstreamer1.0-plugins-good pipewire python3-gi python3-dbus

# Install Python package
pip install liao

# Or install from source with Linux-specific dependencies
git clone https://github.com/cycleuser/Liao.git
cd Liao
pip install -r requirements-linux.txt

Note: xdotool works for most chat apps on Wayland since they typically run under XWayland compatibility layer.

OCR Engine Selection

Liao supports three OCR engines. Install at least one:

Engine	Install Command	Notes
EasyOCR	`pip install easyocr`	Best accuracy, requires PyTorch (~2GB download)
RapidOCR	`pip install rapidocr-onnxruntime`	Lightweight, fast, Python <3.13 only
pytesseract	`pip install pytesseract`	Universal fallback, requires tesseract binary

For Linux with pytesseract, install the tesseract binary:

sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng

Quick Start

Launch the GUI

liao
# or
python -m liao

Command Line Interface

# List available windows
liao list
liao list --chat-only

# Run headless automation
liao auto --title "WeChat" --model llama3 --rounds 5

Usage Guide

Step 1: Launch and Connect to LLM

Start the application and configure the LLM connection. Enter the API endpoint URL and model name. For Ollama, use http://localhost:11434. For cloud APIs, enter your API key.

Launch Interface

Step 2: Configure Language (Optional)

Switch between English and Chinese interface using the language dropdown.

Language Settings

Step 3: Select Model

Choose an LLM model from the available options. Click "Refresh Models" to update the list.

Model Selection

Step 4: Select Target Window

Click "Refresh Windows" to list open applications, then double-click to select the target chat window.

Window Selection

Step 5: Configure Chat Areas

Click "Capture & Detect" to automatically detect the chat and input areas, or manually select regions using the visual overlay.

Area Configuration

Step 6: Start Automation

Enter a system prompt to define the assistant's personality, set the number of conversation rounds, and click "Start Auto Chat" to begin.

Start Conversation

Programmatic Usage

from liao import VisionAgent, LLMClientFactory
from liao.core import WindowManager

# Create LLM client (Ollama local)
llm = LLMClientFactory.create_client(
    provider="ollama",
    base_url="http://localhost:11434",
    model="llama3"
)

# Find target window
wm = WindowManager()
window = wm.find_window_by_title("WeChat")

# Create and run agent
agent = VisionAgent(
    llm_client=llm,
    target_window=window,
    prompt="You are a friendly assistant",
    max_rounds=10,
)

agent.run()

Project Structure

Liao/
├── src/liao/
│   ├── __init__.py           # Version and public API
│   ├── api.py                # Public API (VisionAgent)
│   ├── cli.py                # CLI entry point
│   ├── core/                 # Core modules (window, screenshot, input)
│   ├── llm/                  # LLM client implementations
│   ├── agent/                # Agent workflow and chat parsing
│   ├── gui/                  # PySide6 GUI components
│   └── models/               # Data models
├── tests/                    # Unit tests
├── images/                   # Documentation screenshots
├── requirements.txt          # Cross-platform dependencies
├── requirements-linux.txt    # Linux-specific dependencies
└── pyproject.toml            # Package configuration

Development

Setup Development Environment

git clone https://github.com/cycleuser/Liao.git
cd Liao
pip install -e ".[all,dev]"

Run Tests

pytest tests/ -v

Build and Publish

# Build package
python -m build

# Upload to PyPI
twine upload dist/*

Troubleshooting

Windows

"pywin32 not found": Run pip install pywin32 and restart Python
Screenshot capture fails: Run as administrator if capturing protected windows

Linux

"xdotool not found": Install with sudo apt install xdotool
Input simulation not working: Ensure xdotool is installed and the target window is an X11 or XWayland window
Wayland screenshot fails: Install GStreamer and PipeWire packages, grant screen capture permission when prompted
OCR returns empty results: Install an OCR engine (pip install rapidocr-onnxruntime or pip install pytesseract)

Agent Integration (OpenAI Function Calling)

Liao exposes OpenAI-compatible tools for LLM agents:

from liao.tools import TOOLS, dispatch

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=TOOLS,
)

result = dispatch(
    tool_call.function.name,
    tool_call.function.arguments,
)

CLI Help

License

This project is licensed under the GNU General Public License v3.0. See LICENSE for details.

Contributing

Contributions are welcome. Please submit issues and pull requests on GitHub.

Acknowledgments

EasyOCR and RapidOCR for text recognition
PySide6 for the GUI framework
Ollama for local LLM inference

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.5

Mar 13, 2026

0.1.4

Mar 10, 2026

0.1.3

Mar 9, 2026

0.1.2

Mar 9, 2026

This version

0.1.1

Mar 7, 2026

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liao-0.1.1.tar.gz (92.6 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

liao-0.1.1-py3-none-any.whl (97.4 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file liao-0.1.1.tar.gz.

File metadata

Download URL: liao-0.1.1.tar.gz
Upload date: Mar 7, 2026
Size: 92.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for liao-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`13f404f84e5d069c065d6c5ed39ef876a245139317f27f94f09818aa2c41589a`
MD5	`230cd7fd8466a77558686e55d91df38b`
BLAKE2b-256	`7a8815bb4cffe0fb0e421caa87739f65a15ad498c8bcf6fad99484d449ecf3ed`

See more details on using hashes here.

File details

Details for the file liao-0.1.1-py3-none-any.whl.

File metadata

Download URL: liao-0.1.1-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 97.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for liao-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96afad6195c2d48f9a4633289e7d12c0c23a6d18fc8a8f587974c1fc78a020e9`
MD5	`a455bb27c7f1dfe428254f7fe43e6fad`
BLAKE2b-256	`a7a9e7200d56946761c60b11b68be0fb2c4babfa590c38ed261310202a680a51`

See more details on using hashes here.

liao 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Liao

Features

Installation

Prerequisites

Windows

Linux

OCR Engine Selection

Quick Start

Launch the GUI

Command Line Interface

Usage Guide

Step 1: Launch and Connect to LLM

Step 2: Configure Language (Optional)

Step 3: Select Model

Step 4: Select Target Window

Step 5: Configure Chat Areas

Step 6: Start Automation

Programmatic Usage

Project Structure

Development

Setup Development Environment

Run Tests

Build and Publish

Troubleshooting

Windows

Linux

Agent Integration (OpenAI Function Calling)

CLI Help

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes