Vision extension for nano-wait automation
Project description
Nano-Wait-Vision — Visual Execution Extension
nano-wait-vision is the official computer vision extension for nano-wait. It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.
[!IMPORTANT] Critical Dependency: This package DEPENDS on
nano-wait. It does not replacenano-wait— it extends it.
🧠 What is Nano-Wait-Vision?
Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with sleep(), it allows your code to wait for real visual conditions:
- Text appearing on screen
- Icons becoming visible
- UI states changing
It is designed to work in strict cooperation with nano-wait:
| Component | Responsibility |
|---|---|
| ⏱️ nano-wait | When to check (adaptive pacing & CPU-aware waiting) |
| 👁️ nano-wait-vision | What to check (screen, OCR, icons) |
🧩 Key Features
nano-wait-vision extends nano-wait with:
- 👁️ OCR (Optical Character Recognition): Read real text directly from the screen.
- 🖼️ Icon Detection: Template matching via OpenCV.
- 🧠 Explicit Visual States: Each operation returns a structured
VisionState. - 📚 Persistent & Explainable Diagnostics: No black-box ML models.
- 🖥️ Screen-Based Automation: Ideal for RPA and GUI testing.
- ⚡ Selenium / Pytest Adapters: Immediate adoption in corporate or academic QA workflows.
[!TIP] All waiting logic is delegated to
nano-wait.wait()— nevertime.sleep().
🚀 Quick Start
Installation
pip install nano-wait
pip install nano-wait-vision
Simple Visual Observation
from nano_wait_vision import VisionMode
vision = VisionMode()
state = vision.observe()
print(f"Detected: {state.detected}")
print(f"Text: {state.text}")
Wait for Text to Appear
from nano_wait_vision import VisionMode
vision = VisionMode(verbose=True)
# Wait up to 10 seconds for the word "Welcome"
state = vision.wait_text("Welcome", timeout=10)
if state.detected:
print("Text detected!")
Wait for an Icon
from nano_wait_vision import VisionMode
vision = VisionMode()
# Wait up to 10 seconds for an icon image
state = vision.wait_icon("ok.png", timeout=10)
if state.detected:
print("Icon found on screen.")
⚠️ Installation & Dependencies
This library interacts directly with your operating system screen and OCR engine.
Python Dependencies (auto-installed)
opencv-pythonpytesseractpyautoguinumpy
🧠 Mandatory External Dependency — Tesseract OCR
OCR will not work unless Tesseract is installed and available in your PATH.
| OS | Command / Action |
|---|---|
| macOS | brew install tesseract |
| Ubuntu / Debian | sudo apt install tesseract-ocr |
| Windows | Download from the official Tesseract repo and add to PATH |
[!WARNING] If Tesseract is missing, OCR calls will silently fail or return empty text.
🧠 Mental Model — How It Works
Nano-Wait-Vision follows this loop: observe → evaluate → wait → observe.
Two engines cooperate:
| 👁️ Vision Engine | ⏱️ nano-wait |
|---|---|
| OCR / Icons | Adaptive timing |
| Screen capture | CPU-aware waits |
| Visual states | Smart pacing |
Vision never sleeps. All delays are handled by nano-wait.
📦 VisionState — Return Object
Every visual operation returns a VisionState object:
VisionState(
name: str,
detected: bool,
confidence: float,
attempts: int,
elapsed: float,
text: Optional[str],
icon: Optional[str],
diagnostics: dict
)
Always check detected before acting on the result.
🧪 Diagnostics & Debugging
Nano-Wait-Vision supports verbose diagnostics:
vision = VisionMode(verbose=True)
state = vision.wait_text("Terminal")
Diagnostics include:
- Attempts per phase
- Confidence scores
- Elapsed time
- Reason for failure
A full macOS diagnostic test is provided in test_screen.py, generating debug screenshots for inspection.
🖥️ Platform Notes
macOS (Important)
- Screen capture requires Screen Recording permission.
- OCR requires RGB images (internally handled by Nano-Wait-Vision).
- Fully tested on macOS Retina displays.
Windows & Linux
- Works out of the box.
- Ensure correct DPI scaling on Windows for accurate coordinate mapping.
🧪 Ideal Use Cases
Use Nano-Wait-Vision when dealing with:
- RPA (Robotic Process Automation)
- GUI automation and testing
- OCR-driven workflows
- Visual regression tests
- Applications without APIs
- Screen-based alternatives to Selenium
🧩 Design Philosophy
- Deterministic: Predictable behavior based on visual truth.
- Explainable: Clear diagnostics for every action.
- No opaque ML: Uses reliable computer vision techniques.
- System-aware: Respects system resources via
nano-wait. - Debuggable by design: Built-in tools for troubleshooting.
🧪 Selenium / Pytest Integration
Selenium-style Visual Waits
from nano_wait_vision.selenium import VisionWait
wait = VisionWait(timeout=15)
wait.until_text("Dashboard")
wait.until_icon("ok.png")
Pytest Fixture
def test_homepage(vision):
assert vision.wait_text("Welcome")
Pytest fixture is available via nano_wait_vision.pytest_fixture.vision
📄 License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nano_wait_vision-0.3.0.tar.gz.
File metadata
- Download URL: nano_wait_vision-0.3.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ef7aadf2a89e70743eb9592b397f4ef4401d5242aacb1a1fccd65550bf40e88
|
|
| MD5 |
20b39d5e5aec0cebf3a9ecd24c67e4f8
|
|
| BLAKE2b-256 |
c432b1c7840fa29635c1bf24adee79b019665ca00208d2b5d76c0ba1dbbadbc4
|
File details
Details for the file nano_wait_vision-0.3.0-py3-none-any.whl.
File metadata
- Download URL: nano_wait_vision-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a99a7c9b00297cbb65c910c5c18ed9853f6f076327bed1896a6f03e235668157
|
|
| MD5 |
f11ea68f24f610c389831746612f66f3
|
|
| BLAKE2b-256 |
f92eba8e210bd37a69f2266018e44b1d384e0b60eaa8f32ab6de373c7b9b98a8
|