Skip to main content

WhatsApp automation via vision: screenshot → OCR → click

Project description

wavi — WhatsApp Web Automation via Vision

CLI tool for WhatsApp Web automation. Extracts message history using a vision pipeline (screenshot → OCR → bubbles), and handles navigation and sidebar state via DOM scraping.

Commands

Command What it does Approach
wavi connect [session] Start Chrome daemon, authenticate via QR
wavi status [session] Check if daemon is alive and authenticated DOM
wavi get <contact> Extract full message history from a chat (--grow to page through in chunks) Vision
wavi send <contact> <message> Send a message DOM + keyboard
wavi check-updates [session] Detect new inbound messages in sidebar DOM
wavi list-contacts [session] List all contacts in the "New chat" panel DOM
wavi queue [session] Show operation queue status
wavi stop [session] Gracefully shut down the Chrome daemon

Architecture

Vision pipeline (wavi get)

Screenshot → Crop chat panel → Color-mask detection → Bbox extraction
    ↓
    OCR (tiled) → Timestamp extraction → Message classification → Bubble list

Used for message content because WhatsApp Web obfuscates the message DOM in ways that make direct scraping unreliable.

Key files: element_detector.py, vision.py, runner.py

DOM scraping

Navigation and sidebar state use JavaScript evaluated directly on the page. Each JS constant in session.py has a comment documenting its key selector and the vision-based fallback to implement if the selector breaks after a WA update. When a DOM-scraped feature stops working, check session.py → "DOM scraping inventory" block at the top.

Chrome daemon

Chrome runs as a long-lived background process (started by wavi connect). Playwright connects and disconnects for each operation without ever killing Chrome. Killing Chrome mid-session corrupts WA's IndexedDB and invalidates the session. Shutdown is done only via wavi stop, which navigates to about:blank first so WA can flush state.

Setup

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone <repo> && cd wavi
uv sync

Quick start

# 1. Start daemon and scan QR
wavi connect

# 2. Extract message history
wavi get "Contact Name"

# 2b. Long chat — page through in blocks of 10 iterations
wavi get "Contact Name" --grow --max-iter 10   # block 1
wavi get "Contact Name" --grow --max-iter 10   # block 2 (continues where block 1 stopped)
# repeat until "history is now complete" or no more messages

# 3. Poll for new messages
wavi check-updates           # first run: saves baseline
wavi check-updates           # subsequent: no_updates or updates + contact list

wavi get flags

Flag Behavior
--max-iter N Stop after N scroll iterations (default 300). In --grow mode, N counts only new-content iterations per run.
--from YYYY-MM-DD Stop scrolling when the oldest visible day pill is before this date. Drop bubbles older than the date.
--newest Load existing history_bubbles.json and stop the moment a known message is found. Prepends new messages. Goes toward the present.
--grow Load existing history, fast-forward past known content, then capture N more iterations toward the past. Saves a grow_checkpoint.json so each run continues where the last one stopped. Incompatible with --newest.
--assets DIR Override the output directory (default output/<session>/<contact>/).
--json-out Print the bubble list as JSON to stdout instead of the summary table.

--grow workflow for long chats

wavi get "Contact" --grow --max-iter 10   # run 1: captures first 10 new-content iterations
wavi get "Contact" --grow --max-iter 10   # run 2: fast-forwards to boundary, captures next 10
# repeat — prints "history is now complete" when scrollTop reaches 0

State is stored in output/<session>/<contact>/grow_checkpoint.json. Delete it to restart from scratch (also delete history_bubbles.json).

check-updates behavior

Compares the sidebar snapshot (last message + timestamp per chat) against the previous saved state. Reports a contact as updated only when:

  • its last_message changed, and
  • direction == "inbound" (outbound messages and re-reads are ignored)

Direction is inferred from tick icons (msg-check, msg-dbl-check, etc.) — present → outbound; absent → inbound.

Limitation: only the last visible message per chat is tracked. If multiple messages arrive between two checks, only the most recent is reported per contact. Use wavi get <contact> to retrieve the full history after detection.

Development

make ocr                  # compile the OCR helper to bin/ocr_vision (arm64, ~4x faster pipeline)
make hooks                # git hooks: ruff on commit, ruff+pytest on push (bypass: --no-verify)
uv run pytest tests/ -v   # unit tests (offline, mocked browser)
make corpus               # vision eval on golden screenshots (real OCR, see tests/corpus/README.md)

WAVI_TIMING=1 prints a per-stage timing breakdown of each analyze() run. Roadmap and audit: docs/plan-mejoras.md, docs/audit-checklist.md.

Key files:

  • session.py — Chrome CDP connection + all DOM scraping JS (see inventory block)
  • runner.py — Orchestration: vision pipeline, check_updates, list_contacts
  • element_detector.py — Color-mask morphology for bubble detection
  • vision.py — OCR, classification, timestamp extraction

Debugging

wavi bubbles /path/to/screenshot.png --debug

Produces screenshot_debug.png with annotated boxes:

  • Green: sent messages
  • Blue: received messages
  • Red crosses: audio play button targets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wavi_lib-0.2.0.tar.gz (343.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wavi_lib-0.2.0-py3-none-any.whl (59.6 kB view details)

Uploaded Python 3

File details

Details for the file wavi_lib-0.2.0.tar.gz.

File metadata

  • Download URL: wavi_lib-0.2.0.tar.gz
  • Upload date:
  • Size: 343.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wavi_lib-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3be6054ff09900b31dbd7e3cf61b4c56ce7c4644e3e4cbed45df592017d4fec3
MD5 3b0b78b8ac92d283afc1f2416e202f8f
BLAKE2b-256 7f8c51bc627c8b2302ee061e2b4d6eecbc802713d3ed7a858d298a19da0f4435

See more details on using hashes here.

File details

Details for the file wavi_lib-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: wavi_lib-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 59.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wavi_lib-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c2f9af988d26f6c3df938c0d32ce20d36b2c28544cfdd423ea70a62c6fba1d5
MD5 002142e42257cf4532acabee8218fb89
BLAKE2b-256 4cb25a8477b438ba5d18073965f7d50e255f8d32c4036295141cb312f583b989

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page