Skip to main content

CV POM

Project description

CV_POM - PyPI - Version

Table of Contents

Introduction

CV POM framework provides tools to detect elements in image content and interact with them.

The framework converts any image into a page object model. This model lets you access the elements recognized in the image. Elements contain such properties as labels, coordinates and others. It's also possible to transform the elements into a JSON representation for easy integration with other tools.

Installation

pip install cv_pom

CVPOMDrivers

CV POM Driver

CV POM Driver is built on top of CV POM and provides easy integration with any automation framework (like Selenium or Appium). The user just needs to overwrite a couple of methods of the CVPOMDriver class and then use it as a driver to find elements and interact with them.

Since this approach doesn't require any APIs from the application to test, it is generic for every platform/app combination, allowing the user to automate for each platform with the same APIs. It also allows the automation of workflows based on the UI representation, which validates the stylings and placement of each of the elements, which is something that most UI automation frameworks lack.

Create your own Driver

First, overwrite two methods of CVPOMDriver

from cv_pom.cv_pom_driver import CVPOMDriver


class MyCVPOMDriver(CVPOMDriver):
    def __init__(self, model_path: str | Path, your_driver, **kwargs) -> None:
        super().__init__(model_path, **kwargs)
        self._driver = your_driver  # Store your driver so that you can use it later

    def _get_screenshot(self) -> ndarray:
        """Add the code that takes a screenshot"""

    def _click_coordinates(self, x: int, y: int):
        """Add the code that clicks on the (x,y) coordinates"""

    def _send_keys(self, keys: str):
        """Add the code that send keys"""

    def _swipe_coordinates(self, coords: tuple = None, direction: str = None):
        """Add the code that swipes/scrolls on the coords -> (x,y) and direction (up/down/left/right)"""

    def _hover_coordinates(self, x: int, y: int):
        """Add the code that hovers on the (x,y) coordinates"""

    def _drag_drop(self, x: int, y: int, x_end: int, y_end: int, duration=0.1):
        """Add the code that drags and drops on the (x,y) -> (x_end,y_end) coordinates"""

Then use it for automation

framework_specific_driver = ... # Driver object you create with your automation framework of choice
model_path = "./my-model.pt"
kwargs = {'ocr': {'paragraph': True}} # Optional
cv_pom_driver = MyCVPOMDriver(model_path, framework_specific_driver, **kwargs)

# Find element by label
element = cv_pom_driver.find_element({"label": "reply-main"})
# Click on it
element.click()
# Wait until invisible
element.wait_invisible()
# Methods are also chainable
cv_pom_driver.find_element({"text": "some text"}).click()
# Get all elements to process them manually
cv_pom_driver.find_elements(None)
# Swipe/Scroll by coordinates coords=(x, y, x_end, y_end)
cv_pom_driver.swipe(coords=(10, 10, 400, 400))
# Swipe/Scroll by element
cv_pom_driver.find_element({"label": "reply-main"}).swipe(el=cv_pom_driver.find_element({"label": "rally"}))
# Swipe/Scroll by direction "up", "down", "left" and "right"
cv_pom_driver.find_element({"label": "reply-main"}).swipe(direction="down")

For now, the kwargs in MyCVPOMDriver is only used for ocr and the values are any parameters that EasyOCR allows under self._reader.readtext(**ocr_props_comb) check here

For more info about the query syntax, look into the documentation of POM.get_elements() method (cv_sdk/cv_pom.py).

Drivers Already Implemented

Python-TestUI Driver - Selenium & Appium

To use this driver you will have to install both cv_pom and python-testui PyPI - Version

pip install python-testui

Now you can initialise the driver:

import pytest
from selenium.webdriver.chrome.options import Options
from testui.support.appium_driver import NewDriver, TestUIDriver
from cv_pom.frameworks import TestUICVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver

@pytest.fixture(autouse=True)
def testui_driver():
    
    options = Options()
    options.add_argument("--force-device-scale-factor=1")
    options.page_load_strategy = 'eager'
    driver = NewDriver().set_selenium_driver(chrome_options=options)
    driver.navigate_to("https://jqueryui.com/draggable/")
        
    yield driver
    driver.quit()

@pytest.fixture(autouse=True)
def cv_pom_driver(testui_driver):
    driver = TestUICVPOMDriver("yolov8n.pt", testui_driver, **{'ocr': {'paragraph': False}})
    yield driver

class TestSuite:
    def test_testdevlab(self, testui_driver: TestUIDriver, cv_pom_driver: CVPOMDriver):
        cv_pom_driver.element(
            {"text": {"value": "me around", "contains": True, "case_sensitive": False}}
        ).drag_drop(delta=(300, 0))

PyAutoGui Driver - Native Desktop App Automation

This driver allows you to control the computer that it runs by using OS level interactions. It is very useful to automate Native Desktop Applications

To use this driver you will have to install both cv_pom and pyautogui PyPI - Version

pip install pyautogui

Now you can initialise the driver:

import pytest
from cv_pom.frameworks import DesktopCVPOMDriver
from cv_pom.cv_pom_driver import CVPOMDriver

@pytest.fixture(autouse=True)
def cv_pom_driver():
    driver = DesktopCVPOMDriver("yolov8n.pt", **{'ocr': {'paragraph': False, 'canvas_size': 1200}, "resize": 0.5})
    yield driver


class TestSuite:
    def test_test_unicaja(self, cv_pom_driver: CVPOMDriver):
        page = cv_pom_driver.get_page()
        page.element({"text": {"value": "Project", "contains": True}}).drag_drop(delta=(500, 0))

IMPORTANT NOTE: for MacOS you might need to use "resize": 0.5 for the arguments in the Driver, as the resolution of the screen is double the size due to the retina screens.

CVPOM usage

Python API

The methods for every driver are meant to be able to automate any workflow in any given app. Those methods are described in the above sections.

Besides those, there are also some useful classes that allows you to interact/filter elements:

get_page method allows the user to parse all the visible screen and then do interactions with it, like clicking, sending keys, etc.

page = cv_pom_driver.get_page()
page.element({"text": {"value": "Project", "contains": True}}).click()

if the element is not visible when the first call of get_page happens, then it will try to parse the elements again (you can specify the timeouts, defaults to 10s)

For debugging purposes, you can also retrieve all the elements and print them in terminal or represent them in an image:

page = cv_pom_driver.get_page()
print(page._pom.to_json())

import cv2
cv2.imshow("annotated_image", page._pom.annotated_frame)
cv2.waitKey(1000)
    - select by exact label:                {"label": "my-label"}
    - select by label containing substring: {"label": {"value: "my-label", "contains": True}}
    - select by label not case sensitive:   {"label": {"value: "my-label", "case_sensitive": False}}
    - select by exact text:                 {"text": "my-text"}
    - select by exact label and text:       {"label": "my-label", "text": "my-text"}

See tests or CVPOMDriver implementation for examples of how to use the underlying CVPOM class.

REST API Server

You can run a rest API server in order to use the framework remotely or to use it with other programming languages:

python server.py --model yolov8n.pt

As CLI

You can also inspect the elements in images by using the main.py script

python main.py --model yolov8n.pt --media test/resources/yolo_test_1.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cv_pom-0.2.1.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

cv_pom-0.2.1-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file cv_pom-0.2.1.tar.gz.

File metadata

  • Download URL: cv_pom-0.2.1.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for cv_pom-0.2.1.tar.gz
Algorithm Hash digest
SHA256 465ac97d204948d1cb93bb98d6d2fe48f61d27b6842dbe41c2e0faeb23e66ac6
MD5 0a15501f3b059c203d733863fcc278c3
BLAKE2b-256 e1871842346424351bd788b24a24a919c88c04f2074575957c2bf1d3a43270d9

See more details on using hashes here.

File details

Details for the file cv_pom-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cv_pom-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for cv_pom-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6497d0d0c9b159ee4f6d9b84c944137be15168751757a39ddccfa5fa985382e6
MD5 398b0e1fa4d7eefc92c0b6a7ef0fa885
BLAKE2b-256 89eea9d6240b98f96cdef93cc5b880136c8a46c243647b95263b49fa9ce10c7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page