Skip to main content

Liao (聊) - Vision-based GUI interaction assistant with LLM integration

Project description

Liao

Vision-based GUI interaction assistant with LLM integration.

Liao is a Python application that automates desktop chat applications using OCR and large language models. It captures screen content, recognizes conversation text, generates contextual replies, and simulates user input to send messages automatically.

Supported Platforms: Windows, Linux (X11/Wayland)

Supported LLM Backends: Ollama (local), OpenAI API, Anthropic API, or any OpenAI-compatible endpoint

中文文档

Features

  • Vision-based Automation: Uses OCR to read chat messages and detect UI elements
  • Multiple LLM Backends: Supports Ollama for local inference, OpenAI, Anthropic, and compatible APIs
  • Bilingual Interface: English and Chinese GUI with runtime language switching
  • Chat App Detection: Auto-detects WeChat, QQ, Telegram, Slack, Discord, and other applications
  • Area Detection: Automatic and manual chat/input area detection with visual overlay
  • Reply Gate: Waits for the other party's reply before generating and sending new messages
  • Cross-platform: Full support on Windows; Linux support via xdotool and Wayland ScreenCast

Installation

Prerequisites

Python Version: 3.9 or higher

LLM Backend: At least one of the following:

  • Ollama running locally (recommended for privacy)
  • OpenAI API key
  • Anthropic API key
  • Any OpenAI-compatible API endpoint

Windows

# Install from PyPI
pip install liao

# Or install with OCR support (recommended)
pip install liao[ocr]

# Or install from source
git clone https://github.com/cycleuser/Liao.git
cd Liao
pip install -r requirements.txt

Linux

Linux requires additional system packages for input simulation and screenshot capture.

# Install system dependencies
sudo apt install xdotool wl-clipboard xclip tesseract-ocr tesseract-ocr-chi-sim gnome-screenshot

# Optional: For Wayland screenshot support (PyGObject method)
sudo apt install gstreamer1.0-plugins-good pipewire python3-gi python3-dbus

# Install Python package
pip install liao

# Or install from source with Linux-specific dependencies
git clone https://github.com/cycleuser/Liao.git
cd Liao
pip install -r requirements-linux.txt

Note: xdotool works for most chat apps on Wayland since they typically run under XWayland compatibility layer.

OCR Engine Selection

Liao supports three OCR engines. Install at least one:

Engine Install Command Notes
EasyOCR pip install easyocr Best accuracy, requires PyTorch (~2GB download)
RapidOCR pip install rapidocr-onnxruntime Lightweight, fast, Python <3.13 only
pytesseract pip install pytesseract Universal fallback, requires tesseract binary

For Linux with pytesseract, install the tesseract binary:

sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng

Quick Start

Launch the GUI

liao
# or
python -m liao

Command Line Interface

# List available windows
liao list
liao list --chat-only

# Run headless automation
liao auto --title "WeChat" --model llama3 --rounds 5

Usage Guide

Step 1: Launch and Connect to LLM

Start the application and configure the LLM connection. Enter the API endpoint URL and model name. For Ollama, use http://localhost:11434. For cloud APIs, enter your API key.

Launch Interface

Step 2: Configure Language (Optional)

Switch between English and Chinese interface using the language dropdown.

Language Settings

Step 3: Select Model

Choose an LLM model from the available options. Click "Refresh Models" to update the list.

Model Selection

Step 4: Select Target Window

Click "Refresh Windows" to list open applications, then double-click to select the target chat window.

Window Selection

Step 5: Configure Chat Areas

Click "Capture & Detect" to automatically detect the chat and input areas, or manually select regions using the visual overlay.

Area Configuration

Step 6: Start Automation

Enter a system prompt to define the assistant's personality, set the number of conversation rounds, and click "Start Auto Chat" to begin.

Start Conversation

Programmatic Usage

from liao import VisionAgent, LLMClientFactory
from liao.core import WindowManager

# Create LLM client (Ollama local)
llm = LLMClientFactory.create_client(
    provider="ollama",
    base_url="http://localhost:11434",
    model="llama3"
)

# Find target window
wm = WindowManager()
window = wm.find_window_by_title("WeChat")

# Create and run agent
agent = VisionAgent(
    llm_client=llm,
    target_window=window,
    prompt="You are a friendly assistant",
    max_rounds=10,
)

agent.run()

Project Structure

Liao/
├── src/liao/
│   ├── __init__.py           # Version and public API
│   ├── api.py                # Public API (VisionAgent)
│   ├── cli.py                # CLI entry point
│   ├── core/                 # Core modules (window, screenshot, input)
│   ├── llm/                  # LLM client implementations
│   ├── agent/                # Agent workflow and chat parsing
│   ├── gui/                  # PySide6 GUI components
│   └── models/               # Data models
├── tests/                    # Unit tests
├── images/                   # Documentation screenshots
├── requirements.txt          # Cross-platform dependencies
├── requirements-linux.txt    # Linux-specific dependencies
└── pyproject.toml            # Package configuration

Development

Setup Development Environment

git clone https://github.com/cycleuser/Liao.git
cd Liao
pip install -e ".[all,dev]"

Run Tests

pytest tests/ -v

Build and Publish

# Build package
python -m build

# Upload to PyPI
twine upload dist/*

Troubleshooting

Windows

  • "pywin32 not found": Run pip install pywin32 and restart Python
  • Screenshot capture fails: Run as administrator if capturing protected windows

Linux

  • "xdotool not found": Install with sudo apt install xdotool
  • Input simulation not working: Ensure xdotool is installed and the target window is an X11 or XWayland window
  • Wayland screenshot fails: Install GStreamer and PipeWire packages, grant screen capture permission when prompted
  • OCR returns empty results: Install an OCR engine (pip install rapidocr-onnxruntime or pip install pytesseract)

Agent Integration (OpenAI Function Calling)

Liao exposes OpenAI-compatible tools for LLM agents:

from liao.tools import TOOLS, dispatch

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=TOOLS,
)

result = dispatch(
    tool_call.function.name,
    tool_call.function.arguments,
)

CLI Help

CLI Help

License

This project is licensed under the GNU General Public License v3.0. See LICENSE for details.

Contributing

Contributions are welcome. Please submit issues and pull requests on GitHub.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liao-0.1.3.tar.gz (92.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

liao-0.1.3-py3-none-any.whl (97.4 kB view details)

Uploaded Python 3

File details

Details for the file liao-0.1.3.tar.gz.

File metadata

  • Download URL: liao-0.1.3.tar.gz
  • Upload date:
  • Size: 92.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for liao-0.1.3.tar.gz
Algorithm Hash digest
SHA256 6d5e40ae3d4ba20e464cfbe0ee803f9823545450369518d8874e3974371b59ec
MD5 6d36beccb7f78b29a84c0ff32411af1b
BLAKE2b-256 25bdd7b029debe515e28fa0923f5b5130319a7434e6d10aab1b9774ddfa358fb

See more details on using hashes here.

File details

Details for the file liao-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: liao-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 97.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for liao-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 60b12eb9bfc49681697f0878b7d4d71a8005e4358ab708ec41e238ac0dddb7d6
MD5 abed7e5e88035c5d7870736e82deff9c
BLAKE2b-256 76ad8a7ab8b568cb97e806e2261d3c1031a40dabbf98b95fa39c1e8afc839c79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page