Vision extension for nano-wait automation
Project description
Nano-Wait-Vision — Visual Execution Extension
nano-wait-vision is the official computer vision extension for nano-wait. It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.
[!IMPORTANT] Critical Dependency: This package DEPENDS on
nano-wait. It does not replacenano-wait— it extends it.
🧠 What is Nano-Wait-Vision?
Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with sleep(), it allows your code to wait for real visual conditions:
- Text appearing on screen
- Icons becoming visible
- UI states changing
- Multi-Monitor Support: now you can target specific screens for observation, text detection, and icon detection. Ideal for setups with multiple monitors.
It is designed to work in strict cooperation with nano-wait:
| Component | Responsibility |
|---|---|
| ⏱️ nano-wait | When to check (adaptive pacing & CPU-aware waiting) |
| 👁️ nano-wait-vision | What to check (screen, OCR, icons) |
🧩 Key Features
nano-wait-vision extends nano-wait with:
- 👁️ OCR (Optical Character Recognition): Read real text directly from the screen.
- 🖼️ Icon Detection: Template matching via OpenCV.
- 🖥️ Automatic HiDPI/Retina Support: Icons and template matching are automatically scaled to work flawlessly on 4K, macOS Retina, and Windows HiDPI displays, requiring zero user configuration.
- 🖥️ Multi-Monitor Awareness: Target any monitor by index; works seamlessly in multi-screen setups.
- 🧠 Explicit Visual States: Each operation returns a structured
VisionState. - 📚 Persistent & Explainable Diagnostics: No black-box ML models.
- ⚡ QA-Friendly & Plug-and-Play: Zero dependency on web drivers (like Selenium), making corporate and academic adoption seamless.
- 🖥️ Screen-Based Automation: Ideal for RPA and GUI testing.
[!TIP] All waiting logic is delegated to
nano-wait.wait()— nevertime.sleep().
🚀 Quick Start
Installation
pip install nano-wait
pip install nano-wait-vision
Simple Visual Observation (Single or Multi-Monitor)
from nano_wait_vision import VisionMode
# Observe the primary screen
vision_main = VisionMode(screen_index=0)
state_main = vision_main.observe()
# Observe a secondary screen (if available)
vision_second = VisionMode(screen_index=1)
state_second = vision_second.observe()
print(f"Primary screen text: {state_main.text}")
print(f"Secondary screen text: {state_second.text}")
Wait for Text to Appear
from nano_wait_vision import VisionMode
vision = VisionMode(verbose=True, screen_index=0)
# Wait up to 10 seconds for the word "Welcome" on the primary screen
state = vision.wait_text("Welcome", timeout=10)
if state.detected:
print("Text detected!")
Wait for an Icon
from nano_wait_vision import VisionMode
vision = VisionMode(screen_index=1) # target second monitor
# Wait up to 10 seconds for an icon image on the second monitor
state = vision.wait_icon("ok.png", timeout=10)
if state.detected:
print("Icon found on screen.")
⚠️ Installation & Dependencies
This library interacts directly with your operating system screen and OCR engine.
Python Dependencies (auto-installed)
opencv-pythonpytesseractpyautoguinumpy- Optional for Multi-Monitor:
mss(faster and full multi-monitor support)
🧠 Mandatory External Dependency — Tesseract OCR
OCR will not work unless Tesseract is installed and available in your PATH.
| OS | Command / Action |
|---|---|
| macOS | brew install tesseract |
| Ubuntu / Debian | sudo apt install tesseract-ocr |
| Windows | Download from the official Tesseract repo and add to PATH |
[!WARNING] If Tesseract is missing, OCR calls will silently fail or return empty text.
🧠 Mental Model — How It Works
Nano-Wait-Vision follows this loop: observe → evaluate → wait → observe.
Two engines cooperate:
| 👁️ Vision Engine | ⏱️ nano-wait |
|---|---|
| OCR / Icons | Adaptive timing |
| Screen capture (multi-monitor aware) | CPU-aware waits |
| Visual states | Smart pacing |
Vision never sleeps. All delays are handled by nano-wait.
📦 VisionState — Return Object
Every visual operation returns a VisionState object:
VisionState(
name: str,
detected: bool,
confidence: float,
attempts: int,
elapsed: float,
text: Optional[str],
icon: Optional[str],
diagnostics: dict
)
Always check detected before acting on the result.
🧪 Diagnostics & Debugging
Nano-Wait-Vision supports verbose diagnostics:
vision = VisionMode(verbose=True, screen_index=0)
state = vision.wait_text("Terminal")
Diagnostics include:
- Attempts per phase
- Confidence scores
- Elapsed time
- Reason for failure
🖥️ Platform Notes
Automatic HiDPI/Retina Support (New!)
The library now automatically detects the screen's scaling factor (DPI/Retina) and scales icon templates accordingly. This ensures template matching works reliably on all modern displays (macOS Retina, Windows HiDPI, 4K monitors) without any manual configuration.
Multi-Monitor Support (New!)
- Target a specific monitor using the
screen_indexparameter. - Supports setups with multiple monitors; automatically handles capturing and scaling per screen.
- Optional dependency:
mssfor faster and full multi-monitor screenshots.
macOS (Important)
- Screen capture requires Screen Recording permission.
- OCR requires RGB images (internally handled by Nano-Wait-Vision).
- Fully tested on macOS Retina displays with automatic scaling.
Windows & Linux
- Works out of the box.
🧪 Ideal Use Cases
Use Nano-Wait-Vision when dealing with:
- RPA (Robotic Process Automation)
- GUI automation and testing
- OCR-driven workflows
- Visual regression tests
- Applications without APIs
- Screen-based alternatives to traditional web drivers.
🧩 Design Philosophy
- Deterministic: Predictable behavior based on visual truth.
- Explainable: Clear diagnostics for every action.
- No opaque ML: Uses reliable computer vision techniques.
- System-aware: Respects system resources via
nano-wait. - Debuggable by design: Built-in tools for troubleshooting.
🧪 QA & Automation Adapters (Pytest & Generic Wait)
The library is now completely driver-agnostic and provides dedicated tools for QA and automation workflows.
Generic Visual Waits (VisionWait)
The VisionWait class provides a "Selenium-like" adapter for visual waiting, but is now completely independent of Selenium or any web driver. It's a clean, plug-and-play way to integrate visual checks into any automation framework.
from nano_wait_vision import VisionWait
wait = VisionWait(timeout=15)
wait.until_text("Dashboard")
wait.until_icon("ok.png")
Pytest Fixtures (Plug-and-Play)
For immediate adoption in QA projects, the library provides ready-to-use pytest fixtures.
def test_homepage(vision, wait):
# Use the global VisionMode instance
assert vision.wait_text("Welcome")
# Use the VisionWait adapter
wait.until_icon("login_button.png")
Fixtures are available via nano_wait_vision.pytest_fixture.
📄 License
This project is licensed under the MIT License.
Se você quiser, posso gerar também uma seção visual de diagrama mostrando multi-monitor workflow para o README, que deixa claro como o screen_index funciona em setups com 2 ou mais telas.
Quer que eu faça isso?
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nano_wait_vision-0.4.0.tar.gz.
File metadata
- Download URL: nano_wait_vision-0.4.0.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
958a4631c1153e264b19cdbe0bf67df2d330ab4bd9080c16df265b4e2d6a9f00
|
|
| MD5 |
b26c5faf1e8c12128936c85eaa67e760
|
|
| BLAKE2b-256 |
f821eb1be8fad27acc78d3b9d575263e820f5d542b9bf28f2dee7af0d2413bd9
|
File details
Details for the file nano_wait_vision-0.4.0-py3-none-any.whl.
File metadata
- Download URL: nano_wait_vision-0.4.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72ad2eaafa234d3d68eab9ac1e0bdabe49237619842221bb560c997143a68cbb
|
|
| MD5 |
7997e6d070234c33add2c70d7a9dca53
|
|
| BLAKE2b-256 |
fb9490ae7b252364227550346590151552b41d4670a010d02afb4a6cf4410015
|