A Python package for detecting and interacting with screen elements using computer vision and OCR.
Project description
Screenwise Framework
A Python framework for screen element detection and interaction using computer vision and machine learning.
Overview
Screenwise provides automated detection and interaction with UI elements through:
- Screenshot capture and analysis
- ML-based element detection
- Coordinate-based interaction
- OCR capabilities
- Debug and capture modes
- Cross-platform support
Installation
.. code-block:: bash
pip install screenwise
Basic Usage
Initialize Framework
.. code-block:: python
from t_screenwise.screenwise import Framework
# Initialize with default settings
framework = Framework()
# Initialize with custom settings
framework = Framework(
mode="CAPTURE",
model_path="path/to/model.pth",
labels="path/to/labels.json",
device="cpu"
)
Detect Elements
~~~~~~~~~~~~~~
.. code-block:: python
# Get all detected elements
elements = framework.get()
# Filter for specific element types
buttons = framework.get(filter=["button"])
text = framework.get(filter=["text"])
Interact with Elements
.. code-block:: python
# Click element
element.click()
# Click at specific position
element.click(coords="up_right")
# Type text
element.send_keys("Hello World")
# Click and type
element.click_and_send_keys("Hello World")
Process OCR Elements
.. code-block:: python
framework = Framework()
results = framework.get(image="path/to/image.png", process_ocr=True)
# Work with both types of elements
for element in results:
if isinstance(element, OCRElement):
print(f"OCR Text: {element.text} (Confidence: {element.confidence})")
else:
print(f"Box Label: {element.label}")
OCR Elements
~~~~~~~~~~~
* Text content extraction
* Confidence scoring
* Spatial relationship analysis
* Text-based element search
OCR Spatial Analysis
The OCRElement class provides powerful spatial analysis capabilities through the get_nearest_boxes method:
.. code-block:: python
# Get OCR elements from an image
ocr_elements = framework.get(image="screenshot.png", process_ocr=True)
# For a specific OCR element, find nearest elements in all directions
nearest = ocr_element.get_nearest_boxes(ocr_elements, n=1)
# Access nearest elements by direction
right_element = nearest["right"][0] # Nearest element to the right
left_element = nearest["left"][0] # Nearest element to the left
above_element = nearest["above"][0] # Nearest element above
below_element = nearest["below"][0] # Nearest element below
Features:
- Find n nearest elements in each direction (right, left, above, below)
- Considers spatial overlap when determining nearest elements
- Returns elements sorted by distance
- Useful for understanding layout and relationships between text elements
Features
Screen Elements
* Coordinate-based positioning
* Margin calculations
* Drawing capabilities
Mouse and keyboard interaction
- Debug visualization
Operating Modes
* CAPTURE: Live interaction with screen elements
* DEBUG: Visualization and testing without actual interaction
Configuration
------------
Labels
~~~~~~
Labels are defined in a JSON file mapping element types to numeric IDs:
.. code-block:: json
{
"button": 1,
"text": 2,
"input": 3
// etc...
}
Model
~~~~~
Supports custom trained object detection models:
* Default model trained for common UI elements
* Configurable confidence thresholds
Contributing
-----------
1. Clone the repository
2. Create a feature branch
3. Commit changes
4. Push to branch
5. Create Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
t_screenwise-1.0.3.tar.gz
(16.4 kB
view details)
File details
Details for the file t_screenwise-1.0.3.tar.gz.
File metadata
- Download URL: t_screenwise-1.0.3.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94347fb85673ab7a02990c2c4a91c7092da196793daf4cf0bab18c18f9cdd321
|
|
| MD5 |
247249d4010c0de4e5764a23c4cb80a9
|
|
| BLAKE2b-256 |
c3192bd507486dc28fde70fa4d3cb6d69f092c3f6e68ccb11d9d072ac3c468e9
|