Skip to main content

WhatsApp automation via vision: screenshot → OCR → click

Project description

wavi — WhatsApp Web Automation via Vision

CLI tool for WhatsApp Web automation. Extracts message history using a vision pipeline (screenshot → OCR → bubbles), and handles navigation and sidebar state via DOM scraping.

Commands

Command What it does Approach
wavi connect [session] Start Chrome daemon, authenticate via QR
wavi status [session] Check if daemon is alive and authenticated DOM
wavi reload [session] Safe reload — about:blank flush → WA → verify auth
wavi get <contact> Extract full message history from a chat (--grow to page through in chunks) Vision
wavi send <contact> <message> Send a message DOM + keyboard
wavi check-updates [session] Detect new inbound messages in sidebar DOM
wavi list-contacts [session] List all contacts in the "New chat" panel DOM
wavi queue [session] Show operation queue status
wavi stop [session] Gracefully shut down the Chrome daemon
wavi alias set <name> <session> Assign a friendly alias to a session
wavi alias list List all aliases
wavi alias remove <name> Remove an alias
wavi install-skill Install the Claude Code /wavi skill to ~/.claude/skills/wavi/

Session aliases

All commands accept an alias in place of a phone number. Aliases are stored in data/sessions/aliases.json.

wavi alias set pulpo-bot 5491155612767
wavi alias set mateo 5491122608221
wavi status pulpo-bot       # same as: wavi status 5491155612767
wavi get mateo "Contacto"

Architecture

Vision pipeline (wavi get)

Screenshot → Crop chat panel → Color-mask detection → Bbox extraction
    ↓
    OCR (tiled) → Timestamp extraction → Message classification → Bubble list

Used for message content because WhatsApp Web obfuscates the message DOM in ways that make direct scraping unreliable.

Key files: element_detector.py, vision.py, runner.py

DOM scraping

Navigation and sidebar state use JavaScript evaluated directly on the page. Each JS constant in session.py has a comment documenting its key selector and the vision-based fallback to implement if the selector breaks after a WA update. When a DOM-scraped feature stops working, check session.py → "DOM scraping inventory" block at the top.

Chrome daemon

Chrome runs as a long-lived background process (started by wavi connect). Playwright connects and disconnects for each operation without ever killing Chrome. Killing Chrome mid-session corrupts WA's IndexedDB and invalidates the session. Shutdown is done only via wavi stop, which navigates to about:blank first so WA can flush state.

⚠️ Session safety — critical rules

Never call Page.reload or Storage.clearDataForOrigin on the WhatsApp tab via raw CDP.

WhatsApp Web holds in-flight IndexedDB write transactions while running. Interrupting the page mid-transaction (via Page.reload, Page.navigate, or storage wipe) corrupts the LevelDB database and forces a full QR re-scan. Storage.clearDataForOrigin is worse — it deletes auth tokens entirely.

If WA becomes unresponsive or throttled, use the safe cycle:

# Option A — soft reload (~15s, Chrome keeps running)
wavi reload pulpo-bot
# → session=restored  ✓
# → session=qr_needed  auth lost, need QR scan

# Option B — full restart (~30s)
wavi stop pulpo-bot && wavi connect pulpo-bot

External agents (Pulpo, scripts, automation): never send CDP commands directly to the WA tab. Always go through wavi CLI or the HTTP API (wavi serve). If wavi reload returns qr_needed, alert a human — do not attempt to recover programmatically.

Setup

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone <repo> && cd wavi
uv sync

Quick start

# 1. Start daemon and scan QR
wavi connect

# 2. Extract message history
wavi get "Contact Name"

# 2b. Long chat — page through in blocks of 10 iterations
wavi get "Contact Name" --grow --max-iter 10   # block 1
wavi get "Contact Name" --grow --max-iter 10   # block 2 (continues where block 1 stopped)
# repeat until "history is now complete" or no more messages

# 3. Poll for new messages
wavi check-updates           # first run: saves baseline
wavi check-updates           # subsequent: no_updates or updates + contact list

wavi get flags

Flag Behavior
--max-iter N Stop after N scroll iterations (default 300). In --grow mode, N counts only new-content iterations per run.
--from YYYY-MM-DD Stop scrolling when the oldest visible day pill is before this date. Drop bubbles older than the date.
--newest Load existing history_bubbles.json and stop the moment a known message is found. Prepends new messages. Goes toward the present.
--grow Load existing history, fast-forward past known content, then capture N more iterations toward the past. Saves a grow_checkpoint.json so each run continues where the last one stopped. Incompatible with --newest.
--assets DIR Override the output directory (default output/<session>/<contact>/).
--json-out Print the bubble list as JSON to stdout instead of the summary table.

--grow workflow for long chats

wavi get "Contact" --grow --max-iter 10   # run 1: captures first 10 new-content iterations
wavi get "Contact" --grow --max-iter 10   # run 2: fast-forwards to boundary, captures next 10
# repeat — prints "history is now complete" when scrollTop reaches 0

State is stored in output/<session>/<contact>/grow_checkpoint.json. Delete it to restart from scratch (also delete history_bubbles.json).

check-updates behavior

Compares the sidebar snapshot (last message + timestamp per chat) against the previous saved state. Reports a contact as updated only when:

  • its last_message changed, and
  • direction == "inbound" (outbound messages and re-reads are ignored)

Direction is inferred from tick icons (msg-check, msg-dbl-check, etc.) — present → outbound; absent → inbound.

Limitation: only the last visible message per chat is tracked. If multiple messages arrive between two checks, only the most recent is reported per contact. Use wavi get <contact> to retrieve the full history after detection.

Development

make ocr                  # compile the OCR helper to bin/ocr_vision (arm64, ~4x faster pipeline)
make hooks                # git hooks: ruff on commit, ruff+pytest on push (bypass: --no-verify)
uv run pytest tests/ -v   # unit tests (offline, mocked browser)
make corpus               # vision eval on golden screenshots (real OCR, see tests/corpus/README.md)

WAVI_TIMING=1 prints a per-stage timing breakdown of each analyze() run. Roadmap and audit: docs/plan-mejoras.md, docs/audit-checklist.md.

Key files:

  • session.py — Chrome CDP connection + all DOM scraping JS (see inventory block)
  • runner.py — Orchestration: vision pipeline, check_updates, list_contacts
  • element_detector.py — Color-mask morphology for bubble detection
  • vision.py — OCR, classification, timestamp extraction

Debugging

wavi bubbles /path/to/screenshot.png --debug

Produces screenshot_debug.png with annotated boxes:

  • Green: sent messages
  • Blue: received messages
  • Red crosses: audio play button targets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wavi_lib-0.2.6.tar.gz (351.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wavi_lib-0.2.6-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file wavi_lib-0.2.6.tar.gz.

File metadata

  • Download URL: wavi_lib-0.2.6.tar.gz
  • Upload date:
  • Size: 351.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wavi_lib-0.2.6.tar.gz
Algorithm Hash digest
SHA256 3201a067d65aa80e87859c3ac56758e7d4e8ada07740f3a8acf35aa97a1dd428
MD5 158a9aa3f6bfe6d7773377c6c0528841
BLAKE2b-256 15b9b50123bab2d09956170a528fe5ad15d07c1677b876e57d84a56fe0962384

See more details on using hashes here.

File details

Details for the file wavi_lib-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: wavi_lib-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wavi_lib-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 bdf91034db97a9f9b04ec85fdb7b6e0ec4e5afeade297e461a035f310f366557
MD5 4e23439c70cd9a4b28002db4dd9c1365
BLAKE2b-256 bdc42e167a3587b632cba8bdf138299ad3df52c0681aebc46372401f4467907f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page