Skip to main content

CV POM

Project description

CV_POM - PyPI - Version

Table of Contents

Introduction

CV POM framework provides tools to detect elements in image content and interact with them.

The framework converts any image into a page object model. This model lets you access the elements recognized in the image. Elements contain such properties as labels, coordinates and others. It's also possible to transform the elements into a JSON representation for easy integration with other tools.

Installation

pip install cv_pom

CVPOMDrivers

CV POM Driver

CV POM Driver is built on top of CV POM and provides easy integration with any automation framework (like Selenium or Appium). The user just needs to overwrite a couple of methods of the CVPOMDriver class and then use it as a driver to find elements and interact with them.

Since this approach doesn't require any APIs from the application to test, it is generic for every platform/app combination, allowing the user to automate for each platform with the same APIs. It also allows the automation of workflows based on the UI representation, which validates the stylings and placement of each of the elements, which is something that most UI automation frameworks lack.

Create your own Driver

First, overwrite two methods of CVPOMDriver

from cv_pom.cv_pom_driver import CVPOMDriver


class MyCVPOMDriver(CVPOMDriver):
    def __init__(self, model_path: str | Path, your_driver, **kwargs) -> None:
        super().__init__(model_path, **kwargs)
        self._driver = your_driver  # Store your driver so that you can use it later

    def _get_screenshot(self) -> ndarray:
        """Add the code that takes a screenshot"""

    def _click_coordinates(self, x: int, y: int):
        """Add the code that clicks on the (x,y) coordinates"""

    def _send_keys(self, keys: str):
        """Add the code that send keys"""

    def _swipe_coordinates(self, coords: tuple = None, direction: str = None):
        """Add the code that swipes/scrolls on the coords -> (x,y) and direction (up/down/left/right)"""

    def _hover_coordinates(self, x: int, y: int):
        """Add the code that hovers on the (x,y) coordinates"""

    def _drag_drop(self, x: int, y: int, x_end: int, y_end: int, duration=0.1):
        """Add the code that drags and drops on the (x,y) -> (x_end,y_end) coordinates"""

Then use it for automation

framework_specific_driver = ... # Driver object you create with your automation framework of choice
model_path = "./my-model.pt"
kwargs = {'ocr': {'paragraph': True}} # Optional
cv_pom_driver = MyCVPOMDriver(model_path, framework_specific_driver, **kwargs)

# Find element by label
element = cv_pom_driver.find_element({"label": "reply-main"})
# Click on it
element.click()
# Wait until invisible
element.wait_invisible()
# Methods are also chainable
cv_pom_driver.find_element({"text": "some text"}).click()
# Get all elements to process them manually
cv_pom_driver.find_elements(None)
# Swipe/Scroll by coordinates coords=(x, y, x_end, y_end)
cv_pom_driver.swipe(coords=(10, 10, 400, 400))
# Swipe/Scroll by element
cv_pom_driver.find_element({"label": "reply-main"}).swipe(el=cv_pom_driver.find_element({"label": "rally"}))
# Swipe/Scroll by direction "up", "down", "left" and "right"
cv_pom_driver.find_element({"label": "reply-main"}).swipe(direction="down")

For now, the kwargs in MyCVPOMDriver is only used for ocr and the values are any parameters that EasyOCR allows under self._reader.readtext(**ocr_props_comb) check here

For more info about the query syntax, look into the documentation of POM.get_elements() method (cv_sdk/cv_pom.py).

Drivers Already Implemented

Python-TestUI Driver - Selenium & Appium

To use this driver you will have to install both cv_pom and python-testui PyPI - Version

pip install python-testui

Now you can initialise the driver:

import pytest
from selenium.webdriver.chrome.options import Options
from testui.support.appium_driver import NewDriver, TestUIDriver
from cv_pom.frameworks import TestUICVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver

@pytest.fixture(autouse=True)
def testui_driver():
    
    options = Options()
    options.add_argument("--force-device-scale-factor=1")
    options.page_load_strategy = 'eager'
    driver = NewDriver().set_selenium_driver(chrome_options=options)
    driver.navigate_to("https://jqueryui.com/draggable/")
        
    yield driver
    driver.quit()

@pytest.fixture(autouse=True)
def cv_pom_driver(testui_driver):
    driver = TestUICVPOMDriver("yolov8n.pt", testui_driver, **{'ocr': {'paragraph': False}})
    yield driver

class TestSuite:
    def test_testdevlab(self, testui_driver: TestUIDriver, cv_pom_driver: CVPOMDriver):
        cv_pom_driver.element(
            {"text": {"value": "me around", "contains": True, "case_sensitive": False}}
        ).drag_drop(delta=(300, 0))

PyAutoGui Driver - Native Desktop App Automation

This driver allows you to control the computer that it runs by using OS level interactions. It is very useful to automate Native Desktop Applications

To use this driver you will have to install both cv_pom and pyautogui PyPI - Version

pip install pyautogui

Now you can initialise the driver:

import pytest
from cv_pom.frameworks import DesktopCVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver

@pytest.fixture(autouse=True)
def cv_pom_driver():
    driver = DesktopCVPOMDriver("yolov8n.pt", **{'ocr': {'paragraph': False, 'canvas_size': 1200}, "resize": 0.5})
    yield driver


class TestSuite:
    def test_test(self, cv_pom_driver: CVPOMDriver):
        page = cv_pom_driver.get_page()
        page.element({"text": {"value": "Project", "contains": True}}).drag_drop(delta=(500, 0))

IMPORTANT NOTE: for MacOS you might need to use "resize": 0.5 for the arguments in the Driver, as the resolution of the screen is double the size due to the retina screens.

CVPOM usage

Python API

The methods for every driver are meant to be able to automate any workflow in any given app. Those methods are described in the above sections.

Besides those, there are also some useful classes that allows you to interact/filter elements:

get_page method allows the user to parse all the visible screen and then do interactions with it, like clicking, sending keys, etc.

page = cv_pom_driver.get_page()
page.element({"text": {"value": "Project", "contains": True}}).click()

if the element is not visible when the first call of get_page happens, then it will try to parse the elements again (you can specify the timeouts, defaults to 10s)

For debugging purposes, you can also retrieve all the elements and print them in terminal or represent them in an image:

page = cv_pom_driver.get_page()
print(page._pom.to_json())

import cv2
cv2.imshow("annotated_image", page._pom.annotated_frame)
cv2.waitKey(1000)

See tests or CVPOMDriver implementation for examples of how to use the underlying CVPOM class.

Python API: Element search query

  select by exact label:                           {"label": "my-label"}
  select by label containing substring:            {"label": {"value: "my-label", "contains": True}}
  select by label not case sensitive:              {"label": {"value: "my-label", "case_sensitive": False}}
  select by exact text:                            {"text": "my-text"}
  select by exact label and text:                  {"label": "my-label", "text": "my-text"}
  search by child element:                         {"label": "my-label": "child": {"text": "my-text"}}
  search by parent element:                        {"label": "my-label". "text": "my-text": "parent": {"text": "my text"}}
  search by element on the left/right/up/down:     {"text": "my text": "left/right/up/down": {"text": "my text2"}}

REST API Server

You can run a rest API server in order to use the framework remotely or to use it with other programming languages:

python server.py --model yolov8n.pt

As CLI

You can also inspect the elements in images by using the main.py script

python main.py --model yolov8n.pt --media test/resources/yolo_test_1.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cv_pom-0.2.2.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

cv_pom-0.2.2-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file cv_pom-0.2.2.tar.gz.

File metadata

  • Download URL: cv_pom-0.2.2.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cv_pom-0.2.2.tar.gz
Algorithm Hash digest
SHA256 0214b52e71175ef01a0b29ae950704b6a03866c539208e0a18e45be65677f910
MD5 201ed1038d61b42eec7f5c39a453fd3c
BLAKE2b-256 77f2438450769bc6ac63ff8c3e7c5a02c9739fc2bb718252ea8be1d9824ca8b0

See more details on using hashes here.

File details

Details for the file cv_pom-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: cv_pom-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cv_pom-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ecc90c319896a5a77f10580cfe03d9709ad5f63e5ded899b6f262afe4acf6b5b
MD5 b41c2dfefc7449a910b73855fe1f3e9d
BLAKE2b-256 a1948e4adbb9226c73561a863baa4ad1e8222909c6089b5eb0853275811f0b6b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page