Skip to main content

Vision extension for nano-wait automation

Project description

👁️ Nano-Wait-Vision — Visual Execution Extension

PyPI Version License Python Versions

nano-wait-vision is the official computer vision extension for nano-wait. It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.

[!IMPORTANT] Critical Dependency: This package DEPENDS on nano-wait. It does not replace nano-wait — it extends it.


🧭 Table of Contents

  1. What is Nano-Wait-Vision?
  2. Added Features
  3. Quick Start
  4. Installation & Dependencies
  5. Mental Model — How It Works
  6. VisionState — Return Object
  7. Diagnostics & Debugging
  8. Platform Notes
  9. Ideal Use Cases
  10. Design Philosophy
  11. Relationship with nano-wait

🧠 What is Nano-Wait-Vision?

Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with sleep(), it allows your code to wait for real visual conditions:

  • Text appearing on screen
  • Icons becoming visible
  • UI states changing

It is designed to work in strict cooperation with nano-wait:

Component Responsibility
⏱️ nano-wait When to check (adaptive pacing & CPU-aware waiting)
👁️ nano-wait-vision What to check (screen, OCR, icons)

🧩 Added Features

nano-wait-vision extends nano-wait with:

  • 👁️ OCR (Optical Character Recognition): Read real text directly from the screen.
  • 🖼️ Icon Detection: Template matching via OpenCV.
  • 🧠 Explicit Visual States: Each operation returns a structured VisionState.
  • 📚 Persistent & Explainable Diagnostics: No black-box ML models.
  • 🖥️ Screen-Based Automation: Ideal for RPA and GUI testing.

[!TIP] All waiting logic is delegated to nano-wait.wait() — never time.sleep().


🚀 Quick Start

Installation

pip install nano-wait
pip install nano-wait-vision

Simple Visual Observation

from nano_wait_vision import VisionMode

vision = VisionMode()
state = vision.observe()

print(f"Detected: {state.detected}")
print(f"Text: {state.text}")

Wait for Text to Appear

from nano_wait_vision import VisionMode

vision = VisionMode(verbose=True)

# Wait up to 10 seconds for the word "Welcome"
state = vision.wait_text("Welcome", timeout=10)

if state.detected:
    print("Text detected!")

Wait for an Icon

from nano_wait_vision import VisionMode

vision = VisionMode()

# Wait up to 10 seconds for an icon image
state = vision.wait_icon("ok.png", timeout=10)

if state.detected:
    print("Icon found on screen.")

⚠️ Installation & Dependencies (READ THIS)

This library interacts directly with your operating system screen and OCR engine.

Python Dependencies (auto-installed)

  • opencv-python
  • pytesseract
  • pyautogui
  • numpy

🧠 Mandatory External Dependency — Tesseract OCR

OCR will not work unless Tesseract is installed and available in your PATH.

OS Command / Action
macOS brew install tesseract
Ubuntu / Debian sudo apt install tesseract-ocr
Windows Download from the official Tesseract repo and add to PATH

[!WARNING] If Tesseract is missing, OCR calls will silently fail or return empty text.


🧠 Mental Model — How It Works

Nano-Wait-Vision follows this loop: observe → evaluate → wait → observe.

Two engines cooperate:

👁️ Vision Engine ⏱️ nano-wait
OCR / Icons Adaptive timing
Screen capture CPU-aware waits
Visual states Smart pacing

Vision never sleeps. All delays are handled by nano-wait.


VisionState — Return Object

Every visual operation returns a VisionState object:

VisionState(
    name: str,
    detected: bool,
    confidence: float,
    attempts: int,
    elapsed: float,
    text: Optional[str],
    icon: Optional[str],
    diagnostics: dict
)

Always check detected before acting on the result.


🧪 Diagnostics & Debugging

Nano-Wait-Vision supports verbose diagnostics:

vision = VisionMode(verbose=True)
state = vision.wait_text("Terminal")

Diagnostics include:

  • Attempts per phase
  • Confidence scores
  • Elapsed time
  • Reason for failure

A full macOS diagnostic test is provided in test_screen.py, generating debug screenshots for inspection.


🖥️ Platform Notes

macOS (Important)

  • Screen capture requires Screen Recording permission.
  • OCR requires RGB images.
  • Nano-Wait-Vision internally converts frames to RGB for compatibility.
  • Fully tested on macOS Retina displays.

Windows & Linux

  • Works out of the box.
  • Ensure correct DPI scaling on Windows for accurate coordinate mapping.

🧪 Ideal Use Cases

Use Nano-Wait-Vision when dealing with:

  • RPA (Robotic Process Automation)
  • GUI automation and testing
  • OCR-driven workflows
  • Visual regression tests
  • Applications without APIs
  • Screen-based alternatives to Selenium

🧩 Design Philosophy

  • Deterministic: Predictable behavior based on visual truth.
  • Explainable: Clear diagnostics for every action.
  • No opaque ML: Uses reliable computer vision techniques.
  • System-aware: Respects system resources via nano-wait.
  • Debuggable by design: Built-in tools for troubleshooting.

📌 Relationship with nano-wait

Package Role
nano-wait Adaptive waiting engine
nano-wait-vision Official visual extension

They are separate PyPI packages, designed to work as one coherent system.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_wait_vision-0.2.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nano_wait_vision-0.2.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file nano_wait_vision-0.2.0.tar.gz.

File metadata

  • Download URL: nano_wait_vision-0.2.0.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nano_wait_vision-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0226f18114165f13dcba4ad1bcc27f3e55814f8acba3b6cac44f9d138d5d8b93
MD5 a3562705848fa8492f48d2b73090bdfb
BLAKE2b-256 f7a385e173477320534eeff4b36406446cc1b2d702bb092ce5fa391e1803b75b

See more details on using hashes here.

File details

Details for the file nano_wait_vision-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nano_wait_vision-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60cb7313e36a5b7ccfb6f9c595f20449f6002f7d8253a045464352e7ff3ca988
MD5 8b350a7099ede276837c8af601115a00
BLAKE2b-256 227c0d5b2028b150038487258b71fee7a268f4b789b46980cb7a0980ba628e79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page