Skip to main content

Vision extension for nano-wait automation

Project description

Nano-Wait-Vision — Visual Execution Extension

PyPI version License: MIT

nano-wait-vision is the official computer vision extension for nano-wait. It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.

[!IMPORTANT] Critical Dependency: This package DEPENDS on nano-wait. It does not replace nano-wait — it extends it.


🧠 What is Nano-Wait-Vision?

Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with sleep(), it allows your code to wait for real visual conditions:

  • Text appearing on screen
  • Icons becoming visible
  • UI states changing
  • Multi-Monitor Support: now you can target specific screens for observation, text detection, and icon detection. Ideal for setups with multiple monitors.

It is designed to work in strict cooperation with nano-wait:

Component Responsibility
⏱️ nano-wait When to check (adaptive pacing & CPU-aware waiting)
👁️ nano-wait-vision What to check (screen, OCR, icons)

🧩 Key Features

nano-wait-vision extends nano-wait with:

  • 👁️ OCR (Optical Character Recognition): Read real text directly from the screen.
  • 🖼️ Icon Detection: Template matching via OpenCV.
  • 🖥️ Automatic HiDPI/Retina Support: Icons and template matching are automatically scaled to work flawlessly on 4K, macOS Retina, and Windows HiDPI displays, requiring zero user configuration.
  • 🖥️ Multi-Monitor Awareness: Target any monitor by index; works seamlessly in multi-screen setups.
  • 🧠 Explicit Visual States: Each operation returns a structured VisionState.
  • 📚 Persistent & Explainable Diagnostics: No black-box ML models.
  • ⚡ QA-Friendly & Plug-and-Play: Zero dependency on web drivers (like Selenium), making corporate and academic adoption seamless.
  • 🖥️ Screen-Based Automation: Ideal for RPA and GUI testing.

[!TIP] All waiting logic is delegated to nano-wait.wait() — never time.sleep().


🚀 Quick Start

Installation

pip install nano-wait
pip install nano-wait-vision

Simple Visual Observation (Single or Multi-Monitor)

from nano_wait_vision import VisionMode

# Observe the primary screen
vision_main = VisionMode(screen_index=0)
state_main = vision_main.observe()

# Observe a secondary screen (if available)
vision_second = VisionMode(screen_index=1)
state_second = vision_second.observe()

print(f"Primary screen text: {state_main.text}")
print(f"Secondary screen text: {state_second.text}")

Wait for Text to Appear

from nano_wait_vision import VisionMode

vision = VisionMode(verbose=True, screen_index=0)

# Wait up to 10 seconds for the word "Welcome" on the primary screen
state = vision.wait_text("Welcome", timeout=10)

if state.detected:
    print("Text detected!")

Wait for an Icon

from nano_wait_vision import VisionMode

vision = VisionMode(screen_index=1)  # target second monitor

# Wait up to 10 seconds for an icon image on the second monitor
state = vision.wait_icon("ok.png", timeout=10)

if state.detected:
    print("Icon found on screen.")

⚠️ Installation & Dependencies

This library interacts directly with your operating system screen and OCR engine.

Python Dependencies (auto-installed)

  • opencv-python
  • pytesseract
  • pyautogui
  • numpy
  • Optional for Multi-Monitor: mss (faster and full multi-monitor support)

🧠 Mandatory External Dependency — Tesseract OCR

OCR will not work unless Tesseract is installed and available in your PATH.

OS Command / Action
macOS brew install tesseract
Ubuntu / Debian sudo apt install tesseract-ocr
Windows Download from the official Tesseract repo and add to PATH

[!WARNING] If Tesseract is missing, OCR calls will silently fail or return empty text.


🧠 Mental Model — How It Works

Nano-Wait-Vision follows this loop: observe → evaluate → wait → observe.

Two engines cooperate:

👁️ Vision Engine ⏱️ nano-wait
OCR / Icons Adaptive timing
Screen capture (multi-monitor aware) CPU-aware waits
Visual states Smart pacing

Vision never sleeps. All delays are handled by nano-wait.


📦 VisionState — Return Object

Every visual operation returns a VisionState object:

VisionState(
    name: str,
    detected: bool,
    confidence: float,
    attempts: int,
    elapsed: float,
    text: Optional[str],
    icon: Optional[str],
    diagnostics: dict
)

Always check detected before acting on the result.


🧪 Diagnostics & Debugging

Nano-Wait-Vision supports verbose diagnostics:

vision = VisionMode(verbose=True, screen_index=0)
state = vision.wait_text("Terminal")

Diagnostics include:

  • Attempts per phase
  • Confidence scores
  • Elapsed time
  • Reason for failure

🖥️ Platform Notes

Automatic HiDPI/Retina Support (New!)

The library now automatically detects the screen's scaling factor (DPI/Retina) and scales icon templates accordingly. This ensures template matching works reliably on all modern displays (macOS Retina, Windows HiDPI, 4K monitors) without any manual configuration.

Multi-Monitor Support (New!)

  • Target a specific monitor using the screen_index parameter.
  • Supports setups with multiple monitors; automatically handles capturing and scaling per screen.
  • Optional dependency: mss for faster and full multi-monitor screenshots.

macOS (Important)

  • Screen capture requires Screen Recording permission.
  • OCR requires RGB images (internally handled by Nano-Wait-Vision).
  • Fully tested on macOS Retina displays with automatic scaling.

Windows & Linux

  • Works out of the box.

🧪 Ideal Use Cases

Use Nano-Wait-Vision when dealing with:

  • RPA (Robotic Process Automation)
  • GUI automation and testing
  • OCR-driven workflows
  • Visual regression tests
  • Applications without APIs
  • Screen-based alternatives to traditional web drivers.

🧩 Design Philosophy

  • Deterministic: Predictable behavior based on visual truth.
  • Explainable: Clear diagnostics for every action.
  • No opaque ML: Uses reliable computer vision techniques.
  • System-aware: Respects system resources via nano-wait.
  • Debuggable by design: Built-in tools for troubleshooting.

🧪 QA & Automation Adapters (Pytest & Generic Wait)

The library is now completely driver-agnostic and provides dedicated tools for QA and automation workflows.

Generic Visual Waits (VisionWait)

The VisionWait class provides a "Selenium-like" adapter for visual waiting, but is now completely independent of Selenium or any web driver. It's a clean, plug-and-play way to integrate visual checks into any automation framework.

from nano_wait_vision import VisionWait

wait = VisionWait(timeout=15) 
wait.until_text("Dashboard")
wait.until_icon("ok.png")

Pytest Fixtures (Plug-and-Play)

For immediate adoption in QA projects, the library provides ready-to-use pytest fixtures.

def test_homepage(vision, wait):
    # Use the global VisionMode instance
    assert vision.wait_text("Welcome") 
    
    # Use the VisionWait adapter
    wait.until_icon("login_button.png")

Fixtures are available via nano_wait_vision.pytest_fixture.


📄 License

This project is licensed under the MIT License.


Se você quiser, posso gerar também uma seção visual de diagrama mostrando multi-monitor workflow para o README, que deixa claro como o screen_index funciona em setups com 2 ou mais telas.

Quer que eu faça isso?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_wait_vision-0.4.1.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nano_wait_vision-0.4.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file nano_wait_vision-0.4.1.tar.gz.

File metadata

  • Download URL: nano_wait_vision-0.4.1.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nano_wait_vision-0.4.1.tar.gz
Algorithm Hash digest
SHA256 ecebe863e74d5c2d2592c7f85b6c612566b229311eb73e0c2d3caa536612d9e2
MD5 2da12486a280870ac44fafe7ff1238dc
BLAKE2b-256 4fdfb7f151cae63aab16a7c2ba734c2779286b1789d915d864bdd7e458edb604

See more details on using hashes here.

File details

Details for the file nano_wait_vision-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nano_wait_vision-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b378986683b1ea0c738bba504c089da1e48a0073b19dab6a107cbdc7d066ff2b
MD5 839776ca87e94f698b63b069ee2a55fa
BLAKE2b-256 5ff76eb1697eabdfb0b1ad920ba7376e3d04d3701b1b696583dfb386a2f3c127

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page