Skip to main content

Vision extension for nano-wait automation

Project description

Nano-Wait-Vision — Visual Execution Extension

PyPI version License: MIT

nano-wait-vision is the official computer vision extension for nano-wait. It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.

[!IMPORTANT] Critical Dependency: This package DEPENDS on nano-wait. It does not replace nano-wait — it extends it.


🧠 What is Nano-Wait-Vision?

Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with sleep(), it allows your code to wait for real visual conditions:

  • Text appearing on screen
  • Icons becoming visible
  • UI states changing

It is designed to work in strict cooperation with nano-wait:

Component Responsibility
⏱️ nano-wait When to check (adaptive pacing & CPU-aware waiting)
👁️ nano-wait-vision What to check (screen, OCR, icons)

🧩 Key Features

nano-wait-vision extends nano-wait with:

  • 👁️ OCR (Optical Character Recognition): Read real text directly from the screen.
  • 🖼️ Icon Detection: Template matching via OpenCV.
  • 🖥️ Automatic HiDPI/Retina Support: Icons and template matching are automatically scaled to work flawlessly on 4K, macOS Retina, and Windows HiDPI displays, requiring zero user configuration.
  • 🧠 Explicit Visual States: Each operation returns a structured VisionState.
  • 📚 Persistent & Explainable Diagnostics: No black-box ML models.
  • ⚡ QA-Friendly & Plug-and-Play: Zero dependency on web drivers (like Selenium), making corporate and academic adoption seamless.
  • 🖥️ Screen-Based Automation: Ideal for RPA and GUI testing.

[!TIP] All waiting logic is delegated to nano-wait.wait() — never time.sleep().


🚀 Quick Start

Installation

pip install nano-wait
pip install nano-wait-vision

Simple Visual Observation

from nano_wait_vision import VisionMode

vision = VisionMode()
state = vision.observe()

print(f"Detected: {state.detected}")
print(f"Text: {state.text}")

Wait for Text to Appear

from nano_wait_vision import VisionMode

vision = VisionMode(verbose=True)

# Wait up to 10 seconds for the word "Welcome"
state = vision.wait_text("Welcome", timeout=10)

if state.detected:
    print("Text detected!")

Wait for an Icon

from nano_wait_vision import VisionMode

vision = VisionMode()

# Wait up to 10 seconds for an icon image
state = vision.wait_icon("ok.png", timeout=10)

if state.detected:
    print("Icon found on screen.")

⚠️ Installation & Dependencies

This library interacts directly with your operating system screen and OCR engine.

Python Dependencies (auto-installed)

  • opencv-python
  • pytesseract
  • pyautogui
  • numpy

🧠 Mandatory External Dependency — Tesseract OCR

OCR will not work unless Tesseract is installed and available in your PATH.

OS Command / Action
macOS brew install tesseract
Ubuntu / Debian sudo apt install tesseract-ocr
Windows Download from the official Tesseract repo and add to PATH

[!WARNING] If Tesseract is missing, OCR calls will silently fail or return empty text.


🧠 Mental Model — How It Works

Nano-Wait-Vision follows this loop: observe → evaluate → wait → observe.

Two engines cooperate:

👁️ Vision Engine ⏱️ nano-wait
OCR / Icons Adaptive timing
Screen capture CPU-aware waits
Visual states Smart pacing

Vision never sleeps. All delays are handled by nano-wait.


📦 VisionState — Return Object

Every visual operation returns a VisionState object:

VisionState(
    name: str,
    detected: bool,
    confidence: float,
    attempts: int,
    elapsed: float,
    text: Optional[str],
    icon: Optional[str],
    diagnostics: dict
)

Always check detected before acting on the result.


🧪 Diagnostics & Debugging

Nano-Wait-Vision supports verbose diagnostics:

vision = VisionMode(verbose=True)
state = vision.wait_text("Terminal")

Diagnostics include:

  • Attempts per phase
  • Confidence scores
  • Elapsed time
  • Reason for failure

A full macOS diagnostic test is provided in test_screen.py, generating debug screenshots for inspection.


🖥️ Platform Notes

Automatic HiDPI/Retina Support (New!)

The library now automatically detects the screen's scaling factor (DPI/Retina) and scales icon templates accordingly. This ensures that template matching works reliably on all modern displays (macOS Retina, Windows HiDPI, 4K monitors) without any manual configuration or code changes from the user.

macOS (Important)

  • Screen capture requires Screen Recording permission.
  • OCR requires RGB images (internally handled by Nano-Wait-Vision).
  • Fully tested on macOS Retina displays with automatic scaling.

Windows & Linux

  • Works out of the box.

🧪 Ideal Use Cases

Use Nano-Wait-Vision when dealing with:

  • RPA (Robotic Process Automation)
  • GUI automation and testing
  • OCR-driven workflows
  • Visual regression tests
  • Applications without APIs
  • Screen-based alternatives to traditional web drivers.

🧩 Design Philosophy

  • Deterministic: Predictable behavior based on visual truth.
  • Explainable: Clear diagnostics for every action.
  • No opaque ML: Uses reliable computer vision techniques.
  • System-aware: Respects system resources via nano-wait.
  • Debuggable by design: Built-in tools for troubleshooting.

🧪 QA & Automation Adapters (Pytest & Generic Wait)

The library is now completely driver-agnostic and provides dedicated tools for QA and automation workflows.

Generic Visual Waits (VisionWait)

The VisionWait class provides a "Selenium-like" adapter for visual waiting, but is now completely independent of Selenium or any web driver. It's a clean, plug-and-play way to integrate visual checks into any automation framework.

from nano_wait_vision import VisionWait

# VisionWait is now a generic adapter, not tied to Selenium
wait = VisionWait(timeout=15) 
wait.until_text("Dashboard")
wait.until_icon("ok.png")

Pytest Fixtures (Plug-and-Play)

For immediate adoption in QA projects, the library provides ready-to-use pytest fixtures.

# In your conftest.py or test file
# Fixtures 'vision' and 'wait' are automatically available

def test_homepage(vision, wait):
    # Use the global VisionMode instance
    assert vision.wait_text("Welcome") 
    
    # Use the VisionWait adapter
    wait.until_icon("login_button.png")

Fixtures are available via nano_wait_vision.pytest_fixture.


📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_wait_vision-0.3.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nano_wait_vision-0.3.1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file nano_wait_vision-0.3.1.tar.gz.

File metadata

  • Download URL: nano_wait_vision-0.3.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nano_wait_vision-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d98d71a3ccd29cb19334a13d88900dca0c9650e2e99969f64c042d05eed1b292
MD5 d8689fe5b78ec4d3ec5b307e9068e5a4
BLAKE2b-256 96cdd0d505926c3a9031a1904a80e11e7883bf2d7703e21a86d8f4abc1a223ab

See more details on using hashes here.

File details

Details for the file nano_wait_vision-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nano_wait_vision-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 260a3b6e59581d1a647e4cfe4929a056b855df5ecd17bba450bad901ff6cb4c9
MD5 15c80a5f9547aa05fb3138534ae1d7e6
BLAKE2b-256 b02541e56aa3a23dc4207c84e777f171d092c569b359d3e28845acdfe322e1be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page