Skip to main content

A Python package for detecting and interacting with screen elements using computer vision and OCR.

Project description

Screenwise Framework

A Python framework for screen element detection and interaction using computer vision and machine learning.

Overview

Screenwise provides automated detection and interaction with UI elements through:

  • Screenshot capture and analysis
  • ML-based element detection
  • Coordinate-based interaction
  • OCR capabilities
  • Debug and capture modes
  • Cross-platform support

Installation

.. code-block:: bash

pip install screenwise

Basic Usage

Initialize Framework

.. code-block:: python

    from t_screenwise.screenwise import Framework

    # Initialize with default settings
    framework = Framework()

    # Initialize with custom settings
    framework = Framework(
        mode="CAPTURE",
        model_path="path/to/model.pth",
        labels="path/to/labels.json",
        device="cpu"
    )

Detect Elements
~~~~~~~~~~~~~~
.. code-block:: python

    # Get all detected elements
    elements = framework.get()

    # Filter for specific element types
    buttons = framework.get(filter=["button"])
    text = framework.get(filter=["text"])

Interact with Elements

.. code-block:: python

# Click element
element.click()

# Click at specific position
element.click(coords="up_right")

# Type text
element.send_keys("Hello World")

# Click and type
element.click_and_send_keys("Hello World")

Process OCR Elements

.. code-block:: python

    framework = Framework()
    results = framework.get(image="path/to/image.png", process_ocr=True)

    # Work with both types of elements
    for element in results:
        if isinstance(element, OCRElement):
            print(f"OCR Text: {element.text} (Confidence: {element.confidence})")
        else:
            print(f"Box Label: {element.label}")

OCR Elements
~~~~~~~~~~~
* Text content extraction
* Confidence scoring
* Spatial relationship analysis
* Text-based element search

OCR Spatial Analysis

The OCRElement class provides powerful spatial analysis capabilities through the get_nearest_boxes method:

.. code-block:: python

# Get OCR elements from an image
ocr_elements = framework.get(image="screenshot.png", process_ocr=True)

# For a specific OCR element, find nearest elements in all directions
nearest = ocr_element.get_nearest_boxes(ocr_elements, n=1)

# Access nearest elements by direction
right_element = nearest["right"][0]  # Nearest element to the right
left_element = nearest["left"][0]    # Nearest element to the left
above_element = nearest["above"][0]   # Nearest element above
below_element = nearest["below"][0]   # Nearest element below

Features:

  • Find n nearest elements in each direction (right, left, above, below)
  • Considers spatial overlap when determining nearest elements
  • Returns elements sorted by distance
  • Useful for understanding layout and relationships between text elements

Features

Screen Elements

* Coordinate-based positioning
* Margin calculations
* Drawing capabilities

Mouse and keyboard interaction
  • Debug visualization

Operating Modes

* CAPTURE: Live interaction with screen elements
* DEBUG: Visualization and testing without actual interaction

Configuration
------------

Labels
~~~~~~
Labels are defined in a JSON file mapping element types to numeric IDs:

.. code-block:: json

    {
        "button": 1,
        "text": 2,
        "input": 3
        // etc...
    }

Model
~~~~~
Supports custom trained object detection models:

* Default model trained for common UI elements
* Configurable confidence thresholds

Contributing
-----------
1. Clone the repository
2. Create a feature branch
3. Commit changes
4. Push to branch
5. Create Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

t_screenwise-1.0.3.tar.gz (16.4 kB view details)

Uploaded Source

File details

Details for the file t_screenwise-1.0.3.tar.gz.

File metadata

  • Download URL: t_screenwise-1.0.3.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.0

File hashes

Hashes for t_screenwise-1.0.3.tar.gz
Algorithm Hash digest
SHA256 94347fb85673ab7a02990c2c4a91c7092da196793daf4cf0bab18c18f9cdd321
MD5 247249d4010c0de4e5764a23c4cb80a9
BLAKE2b-256 c3192bd507486dc28fde70fa4d3cb6d69f092c3f6e68ccb11d9d072ac3c468e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page