CV POM
Project description
CV_POM -
Table of Contents
Introduction
CV POM framework provides tools to detect elements in image content and interact with them.
The framework converts any image into a page object model. This model lets you access the elements recognized in the image. Elements contain such properties as labels, coordinates and others. It's also possible to transform the elements into a JSON representation for easy integration with other tools.
Installation
pip install cv_pom
CVPOMDrivers
CV POM Driver
CV POM Driver is built on top of CV POM and provides easy integration with any automation framework (like Selenium or Appium). The user just needs to overwrite a couple of methods of the CVPOMDriver
class and then use it as a driver to find elements and interact with them.
Since this approach doesn't require any APIs from the application to test, it is generic for every platform/app combination, allowing the user to automate for each platform with the same APIs. It also allows the automation of workflows based on the UI representation, which validates the stylings and placement of each of the elements, which is something that most UI automation frameworks lack.
Create your own Driver
First, overwrite two methods of CVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver
class MyCVPOMDriver(CVPOMDriver):
def __init__(self, model_path: str | Path, your_driver, **kwargs) -> None:
super().__init__(model_path, **kwargs)
self._driver = your_driver # Store your driver so that you can use it later
def _get_screenshot(self) -> ndarray:
"""Add the code that takes a screenshot"""
def _click_coordinates(self, x: int, y: int):
"""Add the code that clicks on the (x,y) coordinates"""
def _send_keys(self, keys: str):
"""Add the code that send keys"""
def _swipe_coordinates(self, coords: tuple = None, direction: str = None):
"""Add the code that swipes/scrolls on the coords -> (x,y) and direction (up/down/left/right)"""
def _hover_coordinates(self, x: int, y: int):
"""Add the code that hovers on the (x,y) coordinates"""
def _drag_drop(self, x: int, y: int, x_end: int, y_end: int, duration=0.1):
"""Add the code that drags and drops on the (x,y) -> (x_end,y_end) coordinates"""
Then use it for automation
framework_specific_driver = ... # Driver object you create with your automation framework of choice
model_path = "./my-model.pt"
kwargs = {'ocr': {'paragraph': True}} # Optional
cv_pom_driver = MyCVPOMDriver(model_path, framework_specific_driver, **kwargs)
# Find element by label
element = cv_pom_driver.find_element({"label": "reply-main"})
# Click on it
element.click()
# Wait until invisible
element.wait_invisible()
# Methods are also chainable
cv_pom_driver.find_element({"text": "some text"}).click()
# Get all elements to process them manually
cv_pom_driver.find_elements(None)
# Swipe/Scroll by coordinates coords=(x, y, x_end, y_end)
cv_pom_driver.swipe(coords=(10, 10, 400, 400))
# Swipe/Scroll by element
cv_pom_driver.find_element({"label": "reply-main"}).swipe(el=cv_pom_driver.find_element({"label": "rally"}))
# Swipe/Scroll by direction "up", "down", "left" and "right"
cv_pom_driver.find_element({"label": "reply-main"}).swipe(direction="down")
For now, the kwargs in MyCVPOMDriver
is only used for ocr
and the values are any parameters that EasyOCR allows under self._reader.readtext(**ocr_props_comb)
check here
For more info about the query syntax, look into the documentation of POM.get_elements()
method (cv_sdk/cv_pom.py
).
Drivers Already Implemented
Python-TestUI Driver - Selenium & Appium
To use this driver you will have to install both cv_pom
and python-testui
pip install python-testui
Now you can initialise the driver:
import pytest
from selenium.webdriver.chrome.options import Options
from testui.support.appium_driver import NewDriver, TestUIDriver
from cv_pom.frameworks import TestUICVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver
@pytest.fixture(autouse=True)
def testui_driver():
options = Options()
options.add_argument("--force-device-scale-factor=1")
options.page_load_strategy = 'eager'
driver = NewDriver().set_selenium_driver(chrome_options=options)
driver.navigate_to("https://jqueryui.com/draggable/")
yield driver
driver.quit()
@pytest.fixture(autouse=True)
def cv_pom_driver(testui_driver):
driver = TestUICVPOMDriver("yolov8n.pt", testui_driver, **{'ocr': {'paragraph': False}})
yield driver
class TestSuite:
def test_testdevlab(self, testui_driver: TestUIDriver, cv_pom_driver: CVPOMDriver):
cv_pom_driver.element(
{"text": {"value": "me around", "contains": True, "case_sensitive": False}}
).drag_drop(delta=(300, 0))
PyAutoGui Driver - Native Desktop App Automation
This driver allows you to control the computer that it runs by using OS level interactions. It is very useful to automate Native Desktop Applications
To use this driver you will have to install both cv_pom
and pyautogui
pip install pyautogui
Now you can initialise the driver:
import pytest
from cv_pom.frameworks import DesktopCVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver
@pytest.fixture(autouse=True)
def cv_pom_driver():
driver = DesktopCVPOMDriver("yolov8n.pt", **{'ocr': {'paragraph': False, 'canvas_size': 1200}, "resize": 0.5})
yield driver
class TestSuite:
def test_test(self, cv_pom_driver: CVPOMDriver):
page = cv_pom_driver.get_page()
page.element({"text": {"value": "Project", "contains": True}}).drag_drop(delta=(500, 0))
IMPORTANT NOTE: for MacOS you might need to use "resize": 0.5
for the arguments in the Driver, as the resolution of the screen is double the size due to the retina screens.
CVPOM usage
Python API
The methods for every driver are meant to be able to automate any workflow in any given app. Those methods are described in the above sections.
Besides those, there are also some useful classes that allows you to interact/filter elements:
get_page
method allows the user to parse all the visible screen and then do interactions with it, like clicking, sending keys, etc.
page = cv_pom_driver.get_page()
page.element({"text": {"value": "Project", "contains": True}}).click()
if the element is not visible when the first call of get_page
happens, then it will try to parse the elements again (you can specify the timeouts, defaults to 10s)
For debugging purposes, you can also retrieve all the elements and print them in terminal or represent them in an image:
page = cv_pom_driver.get_page()
print(page._pom.to_json())
import cv2
cv2.imshow("annotated_image", page._pom.annotated_frame)
cv2.waitKey(1000)
See tests or CVPOMDriver
implementation for examples of how to use the underlying CVPOM class.
Python API: Element search query
select by exact label: {"label": "my-label"}
select by label containing substring: {"label": {"value: "my-label", "contains": True}}
select by label not case sensitive: {"label": {"value: "my-label", "case_sensitive": False}}
select by exact text: {"text": "my-text"}
select by exact label and text: {"label": "my-label", "text": "my-text"}
search by child element: {"label": "my-label": "child": {"text": "my-text"}}
search by parent element: {"label": "my-label". "text": "my-text": "parent": {"text": "my text"}}
search by element on the left/right/up/down: {"text": "my text": "left/right/up/down": {"text": "my text2"}}
REST API Server
You can run a rest API server in order to use the framework remotely or to use it with other programming languages:
python server.py --model yolov8n.pt
As CLI
You can also inspect the elements in images by using the main.py
script
python main.py --model yolov8n.pt --media test/resources/yolo_test_1.png
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cv_pom-0.2.2.tar.gz
.
File metadata
- Download URL: cv_pom-0.2.2.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0214b52e71175ef01a0b29ae950704b6a03866c539208e0a18e45be65677f910 |
|
MD5 | 201ed1038d61b42eec7f5c39a453fd3c |
|
BLAKE2b-256 | 77f2438450769bc6ac63ff8c3e7c5a02c9739fc2bb718252ea8be1d9824ca8b0 |
File details
Details for the file cv_pom-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: cv_pom-0.2.2-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecc90c319896a5a77f10580cfe03d9709ad5f63e5ded899b6f262afe4acf6b5b |
|
MD5 | b41c2dfefc7449a910b73855fe1f3e9d |
|
BLAKE2b-256 | a1948e4adbb9226c73561a863baa4ad1e8222909c6089b5eb0853275811f0b6b |