Vision extension for nano-wait automation

These details have not been verified by PyPI

Project links

Repository

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Nano-Wait-Vision — Visual Execution Extension

nano-wait-vision is the official computer vision extension for nano-wait. It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.

[!IMPORTANT] Critical Dependency: This package DEPENDS on nano-wait. It does not replace nano-wait — it extends it.

🧠 What is Nano-Wait-Vision?

Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with sleep(), it allows your code to wait for real visual conditions:

Text appearing on screen
Icons becoming visible
UI states changing
Multi-Monitor Support: now you can target specific screens for observation, text detection, and icon detection. Ideal for setups with multiple monitors.

It is designed to work in strict cooperation with nano-wait:

Component	Responsibility
⏱️ nano-wait	When to check (adaptive pacing & CPU-aware waiting)
👁️ nano-wait-vision	What to check (screen, OCR, icons)

🧩 Key Features

nano-wait-vision extends nano-wait with:

👁️ OCR (Optical Character Recognition): Read real text directly from the screen.
🖼️ Icon Detection: Template matching via OpenCV.
🖥️ Automatic HiDPI/Retina Support: Icons and template matching are automatically scaled to work flawlessly on 4K, macOS Retina, and Windows HiDPI displays, requiring zero user configuration.
🖥️ Multi-Monitor Awareness: Target any monitor by index; works seamlessly in multi-screen setups.
🧠 Explicit Visual States: Each operation returns a structured VisionState.
📚 Persistent & Explainable Diagnostics: No black-box ML models.
⚡ QA-Friendly & Plug-and-Play: Zero dependency on web drivers (like Selenium), making corporate and academic adoption seamless.
🖥️ Screen-Based Automation: Ideal for RPA and GUI testing.

[!TIP] All waiting logic is delegated to nano-wait.wait() — never time.sleep().

🚀 Quick Start

Installation

pip install nano-wait
pip install nano-wait-vision

Simple Visual Observation (Single or Multi-Monitor)

from nano_wait_vision import VisionMode

# Observe the primary screen
vision_main = VisionMode(screen_index=0)
state_main = vision_main.observe()

# Observe a secondary screen (if available)
vision_second = VisionMode(screen_index=1)
state_second = vision_second.observe()

print(f"Primary screen text: {state_main.text}")
print(f"Secondary screen text: {state_second.text}")

Wait for Text to Appear

from nano_wait_vision import VisionMode

vision = VisionMode(verbose=True, screen_index=0)

# Wait up to 10 seconds for the word "Welcome" on the primary screen
state = vision.wait_text("Welcome", timeout=10)

if state.detected:
    print("Text detected!")

Wait for an Icon

from nano_wait_vision import VisionMode

vision = VisionMode(screen_index=1)  # target second monitor

# Wait up to 10 seconds for an icon image on the second monitor
state = vision.wait_icon("ok.png", timeout=10)

if state.detected:
    print("Icon found on screen.")

⚠️ Installation & Dependencies

This library interacts directly with your operating system screen and OCR engine.

Python Dependencies (auto-installed)

opencv-python
pytesseract
pyautogui
numpy
Optional for Multi-Monitor: mss (faster and full multi-monitor support)

🧠 Mandatory External Dependency — Tesseract OCR

OCR will not work unless Tesseract is installed and available in your PATH.

OS	Command / Action
macOS	`brew install tesseract`
Ubuntu / Debian	`sudo apt install tesseract-ocr`
Windows	Download from the official Tesseract repo and add to PATH

[!WARNING] If Tesseract is missing, OCR calls will silently fail or return empty text.

🧠 Mental Model — How It Works

Nano-Wait-Vision follows this loop: observe → evaluate → wait → observe.

Two engines cooperate:

👁️ Vision Engine	⏱️ nano-wait
OCR / Icons	Adaptive timing
Screen capture (multi-monitor aware)	CPU-aware waits
Visual states	Smart pacing

Vision never sleeps. All delays are handled by nano-wait.

📦 VisionState — Return Object

Every visual operation returns a VisionState object:

VisionState(
    name: str,
    detected: bool,
    confidence: float,
    attempts: int,
    elapsed: float,
    text: Optional[str],
    icon: Optional[str],
    diagnostics: dict
)

Always check detected before acting on the result.

🧪 Diagnostics & Debugging

Nano-Wait-Vision supports verbose diagnostics:

vision = VisionMode(verbose=True, screen_index=0)
state = vision.wait_text("Terminal")

Diagnostics include:

Attempts per phase
Confidence scores
Elapsed time
Reason for failure

🖥️ Platform Notes

Automatic HiDPI/Retina Support (New!)

The library now automatically detects the screen's scaling factor (DPI/Retina) and scales icon templates accordingly. This ensures template matching works reliably on all modern displays (macOS Retina, Windows HiDPI, 4K monitors) without any manual configuration.

Multi-Monitor Support (New!)

Target a specific monitor using the screen_index parameter.
Supports setups with multiple monitors; automatically handles capturing and scaling per screen.
Optional dependency: mss for faster and full multi-monitor screenshots.

macOS (Important)

Screen capture requires Screen Recording permission.
OCR requires RGB images (internally handled by Nano-Wait-Vision).
Fully tested on macOS Retina displays with automatic scaling.

Windows & Linux

Works out of the box.

🧪 Ideal Use Cases

Use Nano-Wait-Vision when dealing with:

RPA (Robotic Process Automation)
GUI automation and testing
OCR-driven workflows
Visual regression tests
Applications without APIs
Screen-based alternatives to traditional web drivers.

🧩 Design Philosophy

Deterministic: Predictable behavior based on visual truth.
Explainable: Clear diagnostics for every action.
No opaque ML: Uses reliable computer vision techniques.
System-aware: Respects system resources via nano-wait.
Debuggable by design: Built-in tools for troubleshooting.

🧪 QA & Automation Adapters (Pytest & Generic Wait)

The library is now completely driver-agnostic and provides dedicated tools for QA and automation workflows.

Generic Visual Waits (`VisionWait`)

The VisionWait class provides a "Selenium-like" adapter for visual waiting, but is now completely independent of Selenium or any web driver. It's a clean, plug-and-play way to integrate visual checks into any automation framework.

from nano_wait_vision import VisionWait

wait = VisionWait(timeout=15) 
wait.until_text("Dashboard")
wait.until_icon("ok.png")

Pytest Fixtures (Plug-and-Play)

For immediate adoption in QA projects, the library provides ready-to-use pytest fixtures.

def test_homepage(vision, wait):
    # Use the global VisionMode instance
    assert vision.wait_text("Welcome") 
    
    # Use the VisionWait adapter
    wait.until_icon("login_button.png")

Fixtures are available via nano_wait_vision.pytest_fixture.

📄 License

This project is licensed under the MIT License.

Se você quiser, posso gerar também uma seção visual de diagrama mostrando multi-monitor workflow para o README, que deixa claro como o screen_index funciona em setups com 2 ou mais telas.

Quer que eu faça isso?

Project details

These details have not been verified by PyPI

Project links

Repository

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.4.1

Mar 21, 2026

This version

0.4.0

Jan 27, 2026

0.3.2

Jan 25, 2026

0.3.1

Jan 16, 2026

0.3.0

Jan 15, 2026

0.2.0

Jan 9, 2026

0.1.0

Jan 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_wait_vision-0.4.0.tar.gz (13.8 kB view details)

Uploaded Jan 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nano_wait_vision-0.4.0-py3-none-any.whl (11.3 kB view details)

Uploaded Jan 27, 2026 Python 3

File details

Details for the file nano_wait_vision-0.4.0.tar.gz.

File metadata

Download URL: nano_wait_vision-0.4.0.tar.gz
Upload date: Jan 27, 2026
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nano_wait_vision-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`958a4631c1153e264b19cdbe0bf67df2d330ab4bd9080c16df265b4e2d6a9f00`
MD5	`b26c5faf1e8c12128936c85eaa67e760`
BLAKE2b-256	`f821eb1be8fad27acc78d3b9d575263e820f5d542b9bf28f2dee7af0d2413bd9`

See more details on using hashes here.

File details

Details for the file nano_wait_vision-0.4.0-py3-none-any.whl.

File metadata

Download URL: nano_wait_vision-0.4.0-py3-none-any.whl
Upload date: Jan 27, 2026
Size: 11.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nano_wait_vision-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72ad2eaafa234d3d68eab9ac1e0bdabe49237619842221bb560c997143a68cbb`
MD5	`7997e6d070234c33add2c70d7a9dca53`
BLAKE2b-256	`fb9490ae7b252364227550346590151552b41d4670a010d02afb4a6cf4410015`

See more details on using hashes here.

nano-wait-vision 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Nano-Wait-Vision — Visual Execution Extension

🧠 What is Nano-Wait-Vision?

🧩 Key Features

🚀 Quick Start

Installation

Simple Visual Observation (Single or Multi-Monitor)

Wait for Text to Appear

Wait for an Icon

⚠️ Installation & Dependencies

Python Dependencies (auto-installed)

🧠 Mandatory External Dependency — Tesseract OCR

🧠 Mental Model — How It Works

📦 VisionState — Return Object

🧪 Diagnostics & Debugging

🖥️ Platform Notes

Automatic HiDPI/Retina Support (New!)

Multi-Monitor Support (New!)

macOS (Important)

Windows & Linux

🧪 Ideal Use Cases

🧩 Design Philosophy

🧪 QA & Automation Adapters (Pytest & Generic Wait)

Generic Visual Waits (VisionWait)

Pytest Fixtures (Plug-and-Play)

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Generic Visual Waits (`VisionWait`)