Skip to main content

WeClaw-CUA — vision-based WeChat message capture and report CLI for LLMs

Project description

WeClaw-CUA

Vision-based WeChat message capture and report generation from the command line.

License Python Platform PRs Welcome

Installation Guide · 中文文档


Table of Contents

Highlights

Vision-based capture Screenshots + vision LLM to extract messages — no database decryption
Cross-platform macOS (Accessibility API + Quartz) and Windows (UI Automation)
No API key required Stepwise mode lets the calling agent handle all LLM calls
OpenClaw gateway Reuse your local OpenClaw gateway — no separate OpenRouter key
AI-first JSON output by default, designed for LLM agent tool calls
Fully local All UI automation runs on your machine; data never leaves your device
13 commands init, run, capture, finalize, report, build-report-prompt, sessions, history, search, export, stats, unread, new-messages

How It Works

Unlike tools that decrypt WeChat's local SQLite databases, WeClaw-CUA uses a pure vision approach:

  1. Locates the WeChat desktop window via OS-level APIs
  2. Scans the sidebar for chats selected by config or CLI options
  3. Clicks into each chat, scrolls through messages, captures screenshots
  4. Stitches screenshots into long images (OpenCV-based template matching)
  5. Sends stitched images to a vision LLM for structured message extraction
  6. Post-processes and deduplicates extracted messages into clean JSON

This means WeClaw-CUA works with any WeChat version and requires no key extraction or database access.

Capture-all fast path

When groups_to_monitor is ["*"] or [], chat_type is all, and sidebar_unread_only is false (or --chat-type all --unread-mode all is passed), WeClaw uses a faster visual workflow. In this mode it does not need to classify sidebar rows as group/private or unread/read. It treats every visible chat row equally, clicks through the sidebar from top to bottom, and reserves vision-LLM calls for the actual message extraction step.

flowchart TD
    startNode["weclaw capture/run"] --> configCheck["Check config: wildcard, chat_type=all, unread=false"]
    configCheck --> fastPath["Capture-all fast path"]
    fastPath --> topScroll["Scroll sidebar to top once"]
    topScroll --> sidebarOcr["OCR sidebar rows and click boxes"]
    sidebarOcr --> rowLoop["Click each visible row top-to-bottom"]
    rowLoop --> headerOcr["OCR main chat header for full chat title"]
    headerOcr --> captureFrames["Activate chat panel, scroll, capture frames"]
    captureFrames --> stitchFrames["Stitch message screenshots"]
    stitchFrames --> messageVlm["Vision LLM extracts messages"]
    messageVlm --> saveJson["Save deduped JSON"]
    saveJson --> nextRow{"More rows in viewport?"}
    nextRow -->|Yes| rowLoop
    nextRow -->|No| scrollDown["Scroll sidebar down"]
    scrollDown --> repeated{"Repeated viewport or max scrolls?"}
    repeated -->|No| sidebarOcr
    repeated -->|Yes| finishNode["Finished"]

The fast path removes these navigation-time vision-LLM calls:

  • Sidebar classification VLM: row names and click boxes come from OCR (RapidOCR on Windows, native Vision OCR on macOS).
  • Per-chat name re-location: capture-all sweeps visible rows instead of repeatedly searching from the top for a configured name.
  • Post-click current-chat verification VLM: the clicked chat title is read from the main chat header with OCR.
  • Safe-click VLM and new-message-button VLM: the fast path uses deterministic chat-panel activation and skips the optional new-message-button probe.

The normal sidebar classification path is still used for named chats, group-only/private-only wildcard scans, and unread-only scans, because those modes need row semantics such as chat type or unread badge state.


Installation

Requires Python >= 3.10. For a full platform-by-platform walkthrough, see the Installation Guide.

# 1. Create a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
.venv\Scripts\activate           # Windows PowerShell

# 2. Install
pip install "weclaw-cua[macos,llm]"   # macOS
pip install "weclaw-cua[llm,win-ocr]" # Windows
pip install weclaw-cua                # core only (stepwise, no LLM deps)

# 3. Verify
weclaw-cua --version

Note: The PyPI package weclaw (without -cua) is an unrelated third-party project. Always install weclaw-cua. The weclaw console command is an alias for weclaw-cua.

Install from source (for contributors)
git clone https://github.com/Numaira-Technology/weclaw-cua.git
cd weclaw-cua
python3 -m venv .venv

# macOS
./.venv/bin/pip install -e ".[macos,llm]"

# Windows
.venv\Scripts\pip install -e ".[llm,win-ocr]"

Quick Start

Before You Start

Make sure all of the following are true before the first run:

  • WeChat Desktop is installed, open, and already logged in
  • The WeChat window is visible on your desktop
  • You are running commands from the project directory (or a subdirectory containing access to config/config.json)
  • On macOS, your terminal already has Accessibility permission
  • On Windows, if WeChat is running as administrator, your terminal is elevated too

Step 1 — Initialize

weclaw-cua init

Creates config/config.json from the template and verifies platform prerequisites.

macOS: Grant your terminal app Accessibility access in System Settings > Privacy & Security > Accessibility, then restart the terminal.

Windows: If WeChat is running as administrator, run your terminal elevated too.

Step 2 — Configure

Edit config/config.json:

{
  "wechat_app_name": "WeChat",
  "groups_to_monitor": ["*"],
  "sidebar_unread_only": true,
  "chat_type": "group",
  "sidebar_max_scrolls": 16,
  "chat_max_scrolls": 10,
  "report_custom_prompt": "Summarize key decisions and action items from the captured chat messages.",
  "llm_provider": "openrouter",
  "openrouter_api_key": "",
  "openai_api_key": "",
  "deepseek_api_key": "",
  "kimi_api_key": "",
  "glm_api_key": "",
  "qwen_api_key": "",
  "llm_model": "openai/gpt-4o",
  "output_dir": "output"
}

Set chat_type to group, private, or all. Set sidebar_unread_only to true for unread-badge chats only, or false to process read and unread chats that match the other selectors. groups_to_monitor: ["*"] or [] means wildcard scan for all chats allowed by chat_type; otherwise list exact sidebar chat names.

Set sidebar_max_scrolls to control how many times the sidebar may scroll downward during a scan. WeClaw scrolls back upward with sidebar_max_scrolls + 2 wheel steps before each full scan, so the return-to-top distance is always greater than the downward scan distance. Set chat_max_scrolls to control how many times the active chat panel may scroll upward while collecting history.

Set llm_provider to openrouter, openai, deepseek, kimi, glm, or qwen. Fill the matching API key only when using built-in LLM mode. When llm_provider is openrouter, all model slugs route through OpenRouter. Leave keys empty for OpenClaw gateway mode or stepwise mode.

You can also set API keys via environment variables:

export OPENROUTER_API_KEY="sk-or-v1-your-key"          # macOS
export OPENAI_API_KEY="sk-your-openai-key"             # macOS
export DEEPSEEK_API_KEY="sk-your-deepseek-key"         # macOS
export KIMI_API_KEY="sk-your-kimi-key"                 # macOS
export GLM_API_KEY="sk-your-glm-key"                   # macOS
export QWEN_API_KEY="sk-your-qwen-key"                 # macOS
$env:OPENROUTER_API_KEY = "sk-or-v1-your-key"          # Windows PowerShell
$env:OPENAI_API_KEY = "sk-your-openai-key"             # Windows PowerShell
$env:DEEPSEEK_API_KEY = "sk-your-deepseek-key"         # Windows PowerShell
$env:KIMI_API_KEY = "sk-your-kimi-key"                 # Windows PowerShell
$env:GLM_API_KEY = "sk-your-glm-key"                   # Windows PowerShell
$env:QWEN_API_KEY = "sk-your-qwen-key"                 # Windows PowerShell

Step 3 — Run

weclaw-cua run --openclaw-gateway   # recommended: via local OpenClaw gateway
weclaw-cua run                      # built-in LLM mode
weclaw-cua run --chat-type all --unread-mode all  # triggers capture-all fast path
weclaw-cua capture                  # capture only
weclaw-cua report                   # report from existing captures
weclaw-cua sessions                 # list captured chats
weclaw-cua history "Group A" --limit 20
weclaw-cua search "deadline" --chat "Team"

OpenClaw Gateway Mode

Recommended for users who already run a local OpenClaw gateway — no separate OpenRouter key needed.

One-time gateway setup

Enable the OpenAI-compatible HTTP endpoint in ~/.openclaw/openclaw.json:

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: { enabled: true },
      },
    },
  },
}

Restart the OpenClaw gateway, then smoke-test it:

curl -sS http://127.0.0.1:18789/v1/models \
  -H "Authorization: Bearer YOUR_GATEWAY_TOKEN"

A JSON response listing model IDs (e.g. openclaw/default) means the gateway is ready.

Run through OpenClaw

weclaw-cua run --openclaw-gateway

Most users do not need to set any OPENCLAW_* variables manually — WeClaw-CUA auto-discovers the gateway from ~/.openclaw/openclaw.json.

Optional environment overrides
# macOS
export OPENCLAW_GATEWAY_URL="http://127.0.0.1:18789/v1"
export OPENCLAW_API_KEY="YOUR_GATEWAY_TOKEN"
export OPENCLAW_MODEL="openclaw/default"
export OPENCLAW_BACKEND_MODEL="openrouter/google/gemini-2.5-flash"
# Windows PowerShell
$env:OPENCLAW_GATEWAY_URL = "http://127.0.0.1:18789/v1"
$env:OPENCLAW_API_KEY = "YOUR_GATEWAY_TOKEN"
$env:OPENCLAW_MODEL = "openclaw/default"
$env:OPENCLAW_BACKEND_MODEL = "openrouter/google/gemini-2.5-flash"

WeClaw-CUA also auto-discovers config/config.json by walking up from the current directory. Set WECLAW_CONFIG_PATH or pass --config <path> only when running from outside the project tree.


Using with AI Agents (Stepwise Mode)

In stepwise mode (--no-llm), WeClaw-CUA handles all UI automation while the agent handles all LLM calls. No API key needed on the WeClaw-CUA side.

Agent                          WeClaw-CUA                    WeChat
  |                              |                              |
  |-- weclaw-cua capture --no-llm -->|                          |
  |                              |-- screenshot, scroll ------->|
  |                              |-- stitch images              |
  |<-- manifest.json + images ---|                              |
  |                              |                              |
  |  (agent reads manifest.json)                                |
  |  (for each task: send .png + .prompt.txt to own LLM)        |
  |  (write response to .response.txt)                          |
  |                              |                              |
  |-- weclaw-cua finalize ------->|                             |
  |<-- messages.json ------------|                              |
  |                              |                              |
  |-- weclaw-cua build-report-prompt                            |
  |<-- prompt text --------------|                              |
  |  (agent sends prompt to own LLM, gets report)               |

Step-by-Step

1. Capture (no LLM needed):

weclaw-cua capture --no-llm --work-dir ./weclaw_work

Outputs manifest.json listing all pending vision tasks, plus .png images and .prompt.txt files.

2. Process vision tasks (agent's responsibility):

For each task in manifest.json:

  • Read the .png image and .prompt.txt
  • Send to the agent's own vision LLM
  • Write the model response to .response.txt

3. Finalize (produce message JSON):

weclaw-cua finalize --work-dir ./weclaw_work

Reads .response.txt files from --work-dir and writes structured message JSON to output_dir (configured in config.json). --work-dir is required.

4. Get report prompt (agent calls own LLM for report):

weclaw-cua build-report-prompt

Reads all *.json capture files from output_dir — the same directory where finalize writes its output.

Claude Code / Cursor configuration snippet

Add to your CLAUDE.md or .cursor/rules/:

## WeClaw-CUA

You can use `weclaw-cua` (or the `weclaw` alias) to capture and query WeChat messages.

Stepwise workflow (you handle LLM calls):
1. `weclaw-cua capture --no-llm` — capture screenshots, no LLM needed
2. Process each task in manifest.json with your vision model
3. `weclaw-cua finalize --work-dir <dir>` — produce message JSON
4. `weclaw-cua build-report-prompt` — get report prompt, call your own LLM

Query commands (work on captured data, no LLM needed):
- `weclaw-cua sessions` — list captured chats
- `weclaw-cua history "NAME" --limit 20 --format text` — view messages
- `weclaw-cua search "KEYWORD" --chat "CHAT_NAME"` — search messages
- `weclaw-cua stats "CHAT" --format text` — statistics
- `weclaw-cua export "CHAT" --format markdown` — export chat
- `weclaw-cua new-messages` — incremental new messages

Command Reference

See docs/cli-reference.md for every command's options and JSON/text output shape.

Command Description
init First-time setup: create config, verify permissions
run Full pipeline: capture selected chats + generate report
capture Vision-capture selected chats
finalize Process agent-provided LLM responses into JSON (--work-dir required)
report Generate LLM report from existing captured JSON
build-report-prompt Output the report prompt (for agent to call own LLM)
sessions List captured chat sessions
history View messages from a specific session
search Search across captured messages
export Export a session to markdown or plain text
stats Message statistics for a session
unread Scan sidebar for unread chats via vision AI
new-messages Incremental new messages since last check

Most commands output JSON by default. Pass --format text where available for human-readable output.

Per-command examples

init

weclaw-cua init                        # create config + verify permissions
weclaw-cua init --force                # overwrite existing config
weclaw-cua init --config-dir /path     # custom config directory

run

weclaw-cua run --openclaw-gateway      # recommended: via local OpenClaw gateway
weclaw-cua run                         # built-in LLM mode
weclaw-cua run --no-llm                # stepwise: capture only, agent handles LLM
weclaw-cua run --format text           # human-readable output
weclaw-cua run --chat-type all --unread-mode all
weclaw-cua run --sidebar-max-scrolls 30 --chat-max-scrolls 20

Capture-selection overrides: --chat-type group|private|all, --unread-mode unread|all, --sidebar-max-scrolls N, --chat-max-scrolls N.

capture

weclaw-cua capture                     # capture with built-in LLM
weclaw-cua capture --no-llm            # stepwise: output images + prompts only
weclaw-cua capture --no-llm --work-dir ./weclaw_work
weclaw-cua capture --format text
weclaw-cua capture --chat-type private --unread-mode unread

Capture-selection overrides are the same as run.

finalize

weclaw-cua finalize --work-dir ./weclaw_work

report

weclaw-cua report                                     # full report (requires API key)
weclaw-cua report --prompt-only                       # output prompt only
weclaw-cua report --input output/GroupA.json          # from specific file
weclaw-cua report --format text

build-report-prompt

weclaw-cua build-report-prompt
weclaw-cua build-report-prompt --input output/A.json

sessions

weclaw-cua sessions                    # all captured chats (JSON)
weclaw-cua sessions --limit 10
weclaw-cua sessions --format text

history

weclaw-cua history "Group A"                          # last 50 messages
weclaw-cua history "Group A" --limit 100 --offset 50  # pagination
weclaw-cua history "Alice" --type text                # text messages only
weclaw-cua history "Alice" --format text

Options: --limit, --offset, --type, --format

search

weclaw-cua search "hello"
weclaw-cua search "hello" --chat "Alice"
weclaw-cua search "meeting" --chat "A" --chat "B"
weclaw-cua search "report" --type text

Options: --chat (repeatable), --limit, --offset, --type, --format

export

weclaw-cua export "Alice" --format markdown
weclaw-cua export "Alice" --format txt --output chat.txt
weclaw-cua export "Team" --limit 1000

Options: --format markdown|txt, --output, --limit

stats

weclaw-cua stats "Group A"
weclaw-cua stats "Alice" --format text

unread

weclaw-cua unread
weclaw-cua unread --limit 10
weclaw-cua unread --format text
weclaw-cua unread --chat-type private --sidebar-max-scrolls 30

Options: --limit, --format, --chat-type, --sidebar-max-scrolls.

new-messages

weclaw-cua new-messages    # first call: save state, return all messages
weclaw-cua new-messages    # subsequent calls: only new since last check

State saved at <output_dir>/last_check.json. Delete to reset.


Message Types

The --type option (on history and search):

Value Description
text Text messages
system System messages
link_card Links and shared content
image Images
file File attachments
recalled Recalled messages
unsupported Unsupported message types

System Requirements

Platform Status Notes
macOS (Apple Silicon) Supported Requires Accessibility permission
macOS (Intel) Supported Requires Accessibility permission
Windows 10 / 11 Supported Match elevation with WeChat if needed
Linux Not supported Relies on macOS/Windows platform APIs
  • Python >= 3.10
  • WeChat Desktop — any version (vision-based, no version lock-in)
  • LLM API key — OpenRouter, OpenAI, DeepSeek, Kimi, GLM, or Qwen for built-in LLM mode; not needed for stepwise or OpenClaw gateway mode

Output Data Format

Capture commands write one JSON file per chat under output_dir. Each file is a JSON array of message objects:

[
  {
    "chat_name": "Team Chat",
    "sender": "Alice",
    "time": "10:15",
    "content": "Please review the proposal.",
    "type": "text"
  }
]

sender can be empty for system messages, and time can be null or empty when WeChat does not show a timestamp. See docs/cli-reference.md for command result schemas.


Configuration

config/config.json

{
  "wechat_app_name": "WeChat",
  "groups_to_monitor": ["*"],
  "sidebar_unread_only": true,
  "chat_type": "group",
  "sidebar_max_scrolls": 16,
  "chat_max_scrolls": 10,
  "report_custom_prompt": "Summarize key decisions and action items from the captured chat messages.",
  "llm_provider": "openrouter",
  "openrouter_api_key": "",
  "openai_api_key": "",
  "deepseek_api_key": "",
  "kimi_api_key": "",
  "glm_api_key": "",
  "qwen_api_key": "",
  "llm_model": "openai/gpt-4o",
  "output_dir": "output"
}
Field Description
wechat_app_name Window title for WeChat — usually "WeChat" for English locale or "微信" for Chinese locale
groups_to_monitor ["*"] or [] = all chats allowed by chat_type, or list specific chat names
sidebar_unread_only true = only process chats with unread badges; false = include read and unread selected chats
chat_type group, private, or all; applies to wildcard scans and named-chat matches
sidebar_max_scrolls Maximum downward sidebar scrolls per scan. Returning to top uses sidebar_max_scrolls + 2 upward scrolls so it is always greater than the scan limit.
chat_max_scrolls Maximum upward scrolls inside a chat panel while capturing history
report_custom_prompt Custom instructions appended to the LLM report prompt
llm_provider Built-in LLM provider: openrouter, openai, deepseek, kimi, glm, or qwen; moonshot aliases to kimi, and zhipu/z-ai alias to glm
openrouter_api_key OpenRouter API key (or use OPENROUTER_API_KEY env var)
openai_api_key OpenAI API key (or use OPENAI_API_KEY env var)
deepseek_api_key, kimi_api_key, glm_api_key, qwen_api_key Native provider API keys; matching env vars take precedence
llm_model LLM model identifier. OpenRouter sends the full slug unchanged; native providers strip a provider/ prefix before calling the provider.
output_dir Directory for output JSON files

Capture options can also be overridden per command:

weclaw-cua capture --chat-type private --unread-mode unread
weclaw-cua run --chat-type all --unread-mode all --sidebar-max-scrolls 30 --chat-max-scrolls 20
weclaw-cua unread --chat-type group --sidebar-max-scrolls 25

Architecture

See docs/architecture.md for directory structure and data flow diagrams.


Contributing

Contributions are welcome. Please see CONTRIBUTING.md for development setup, coding standards, and pull request guidelines.


License

Apache License 2.0 — see LICENSE.


Disclaimer

This project is a local UI automation tool for personal use only:

  • Read-only — captures what is visible on screen, does not modify WeChat data
  • No database access — uses pure vision, no decryption or memory scanning
  • No cloud transmission — all automation runs locally; only LLM API calls leave your machine (to your configured provider)
  • Use at your own risk — for personal learning and research purposes only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weclaw_cua-0.2.0.tar.gz (153.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weclaw_cua-0.2.0-py3-none-any.whl (198.2 kB view details)

Uploaded Python 3

File details

Details for the file weclaw_cua-0.2.0.tar.gz.

File metadata

  • Download URL: weclaw_cua-0.2.0.tar.gz
  • Upload date:
  • Size: 153.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for weclaw_cua-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d531ddb9a66d5358d5ff0fbbc80d345323399b96bd825c647a13e50cf3084310
MD5 dc777e8d59be34e3fe2b85288e87d651
BLAKE2b-256 cc20adc9bbf5b0885959495cad078dd84fd0047083d391a47b2a762726a92e7e

See more details on using hashes here.

File details

Details for the file weclaw_cua-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: weclaw_cua-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 198.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for weclaw_cua-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 971c721ecc944136ccd10e98b6b3b8d59b93fe6cb92aa4337afa66e57b9b3c33
MD5 7407955042517d26f8746527fbc179b1
BLAKE2b-256 3d9a46089e73e0cc7398e3ef83e2b9969bcd4790c6dfa960d906e9da5688d3bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page