WhatsApp automation via vision: screenshot → OCR → click
Project description
wavi — WhatsApp Web Automation via Vision
CLI tool for WhatsApp Web automation. Extracts message history using a vision pipeline (screenshot → OCR → bubbles), and handles navigation and sidebar state via DOM scraping.
Commands
| Command | What it does | Approach |
|---|---|---|
wavi connect [session] |
Start Chrome daemon, authenticate via QR | — |
wavi status [session] |
Check if daemon is alive and authenticated | DOM |
wavi get <contact> |
Extract full message history from a chat (--grow to page through in chunks) |
Vision |
wavi send <contact> <message> |
Send a message | DOM + keyboard |
wavi check-updates [session] |
Detect new inbound messages in sidebar | DOM |
wavi list-contacts [session] |
List all contacts in the "New chat" panel | DOM |
wavi queue [session] |
Show operation queue status | — |
wavi stop [session] |
Gracefully shut down the Chrome daemon | — |
wavi alias set <name> <session> |
Assign a friendly alias to a session | — |
wavi alias list |
List all aliases | — |
wavi alias remove <name> |
Remove an alias | — |
Session aliases
All commands accept an alias in place of a phone number. Aliases are stored in data/sessions/aliases.json.
wavi alias set pulpo-bot 5491155612767
wavi alias set mateo 5491122608221
wavi status pulpo-bot # same as: wavi status 5491155612767
wavi get mateo "Contacto"
Architecture
Vision pipeline (wavi get)
Screenshot → Crop chat panel → Color-mask detection → Bbox extraction
↓
OCR (tiled) → Timestamp extraction → Message classification → Bubble list
Used for message content because WhatsApp Web obfuscates the message DOM in ways that make direct scraping unreliable.
Key files: element_detector.py, vision.py, runner.py
DOM scraping
Navigation and sidebar state use JavaScript evaluated directly on the page. Each JS constant in session.py has a comment documenting its key selector and the vision-based fallback to implement if the selector breaks after a WA update. When a DOM-scraped feature stops working, check session.py → "DOM scraping inventory" block at the top.
Chrome daemon
Chrome runs as a long-lived background process (started by wavi connect). Playwright connects and disconnects for each operation without ever killing Chrome. Killing Chrome mid-session corrupts WA's IndexedDB and invalidates the session. Shutdown is done only via wavi stop, which navigates to about:blank first so WA can flush state.
Setup
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone <repo> && cd wavi
uv sync
Quick start
# 1. Start daemon and scan QR
wavi connect
# 2. Extract message history
wavi get "Contact Name"
# 2b. Long chat — page through in blocks of 10 iterations
wavi get "Contact Name" --grow --max-iter 10 # block 1
wavi get "Contact Name" --grow --max-iter 10 # block 2 (continues where block 1 stopped)
# repeat until "history is now complete" or no more messages
# 3. Poll for new messages
wavi check-updates # first run: saves baseline
wavi check-updates # subsequent: no_updates or updates + contact list
wavi get flags
| Flag | Behavior |
|---|---|
--max-iter N |
Stop after N scroll iterations (default 300). In --grow mode, N counts only new-content iterations per run. |
--from YYYY-MM-DD |
Stop scrolling when the oldest visible day pill is before this date. Drop bubbles older than the date. |
--newest |
Load existing history_bubbles.json and stop the moment a known message is found. Prepends new messages. Goes toward the present. |
--grow |
Load existing history, fast-forward past known content, then capture N more iterations toward the past. Saves a grow_checkpoint.json so each run continues where the last one stopped. Incompatible with --newest. |
--assets DIR |
Override the output directory (default output/<session>/<contact>/). |
--json-out |
Print the bubble list as JSON to stdout instead of the summary table. |
--grow workflow for long chats
wavi get "Contact" --grow --max-iter 10 # run 1: captures first 10 new-content iterations
wavi get "Contact" --grow --max-iter 10 # run 2: fast-forwards to boundary, captures next 10
# repeat — prints "history is now complete" when scrollTop reaches 0
State is stored in output/<session>/<contact>/grow_checkpoint.json. Delete it to restart from scratch (also delete history_bubbles.json).
check-updates behavior
Compares the sidebar snapshot (last message + timestamp per chat) against the previous saved state. Reports a contact as updated only when:
- its
last_messagechanged, and direction == "inbound"(outbound messages and re-reads are ignored)
Direction is inferred from tick icons (msg-check, msg-dbl-check, etc.) — present → outbound; absent → inbound.
Limitation: only the last visible message per chat is tracked. If multiple messages arrive between two checks, only the most recent is reported per contact. Use wavi get <contact> to retrieve the full history after detection.
Development
make ocr # compile the OCR helper to bin/ocr_vision (arm64, ~4x faster pipeline)
make hooks # git hooks: ruff on commit, ruff+pytest on push (bypass: --no-verify)
uv run pytest tests/ -v # unit tests (offline, mocked browser)
make corpus # vision eval on golden screenshots (real OCR, see tests/corpus/README.md)
WAVI_TIMING=1 prints a per-stage timing breakdown of each analyze() run.
Roadmap and audit: docs/plan-mejoras.md, docs/audit-checklist.md.
Key files:
session.py— Chrome CDP connection + all DOM scraping JS (see inventory block)runner.py— Orchestration: vision pipeline,check_updates,list_contactselement_detector.py— Color-mask morphology for bubble detectionvision.py— OCR, classification, timestamp extraction
Debugging
wavi bubbles /path/to/screenshot.png --debug
Produces screenshot_debug.png with annotated boxes:
- Green: sent messages
- Blue: received messages
- Red crosses: audio play button targets
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wavi_lib-0.2.5.tar.gz.
File metadata
- Download URL: wavi_lib-0.2.5.tar.gz
- Upload date:
- Size: 349.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
883862da53f87b44aacfd1976d6f17e06c6895264505e264c4efd8b06fcef2cb
|
|
| MD5 |
54d00f32cf88724f4b64e7ae98674b56
|
|
| BLAKE2b-256 |
00caf634f84d38499a9d4bb600eae1be34ecf6aed52c832ed2fa4e0b0bc358dc
|
File details
Details for the file wavi_lib-0.2.5-py3-none-any.whl.
File metadata
- Download URL: wavi_lib-0.2.5-py3-none-any.whl
- Upload date:
- Size: 63.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd58bd86ad470d05fa71deda55fcfca5e3bf8a4e266e8048a8eb5684ac75effd
|
|
| MD5 |
6bb8d7787cdfda4096e44f1b0e55c55f
|
|
| BLAKE2b-256 |
ea2edccc47c528c57812945ae54b450a1614bacefd1819d1ed7cdeb623cd5fe9
|