Skip to main content

A simple library to set up Selenium processes

Project description

seleniumprocessor

A simple library to set up Selenium processes

Description

This library allows you to easily set up a process based on Selenium. Thanks to the use of a specific format, it is possible to easily define processes to be passed to Selenium.

Installation

pip install seleniumprocessor

Install a Selenium web driver, e.g., the Chrome WebDriver

Available methods

initiate_connection(webdriverfile, url, to, loginrequired=True, headless=False), returning a selenium.webdriver.chrome.webdriver.WebDriver object allowing browser control

  • webdriverfile is the path of the Selenium web driver file
  • url is the url to open
  • to is the timeout to wait, regarding page loading
  • loginrequired specifies if a manual login from the user is required (True) or not (False)
  • headless specifies if the browser has to be executed in headless mode (True) or not (False)

run_process(brw, url_home, to, p, backtohome_begin=True, backtohome_end=True, checkfilterpassed_callback=None), returning an object, as specified in the process p

  • brw the selenium.webdriver.chrome.webdriver.WebDriver object used to control the browser
  • url_home the home page url
  • to the timeout used to wait the home page load
  • p the list of actions in the current process
  • backtohome_begin specifies if the browser should be redirected to the home page at begin of the method (True) or not (False)
  • backtohome_end specifies if the browser should be redirected to the home page at end of the method (True) or not (False)
  • checkfilterpassed_callback identifies a callback function used to check filters defined in the process p, returing a boolean value (True if the filter is passed, False otherwise)

Objects structure

The main process object is a list of actions to sequentially execute on the process. Each action is represented by an array map with the following fields:

  • name: the name identifying the DOM objects to find
  • class_name: the class name identifying the DOM objects to find
  • index (optional): in case of multiple DOM objects with the same class (or in case a DOM object which is not the first one has to be considered), it is possible to specify the index of the DOM object, in the list of DOM objects using the same class
  • sleep (optional): the sleep timeout used after the action is performed
  • filter: a string passed to the checkfilterpassed_callback for filtering actions
  • action_parameters (optional): its definition depends on the action field
  • action: the action to execute:
    • click: to perform a click on the DOM object
    • click-repeated: to perform a repeated click on the DOM object, until the object is present (useful with sleep, e.g., for pages loading portions of a lists, with a final button to load additional results); the optional action_parameters parameter represents the class name of the objects to count: when the object is unchanged, repeated clicks will be interrupted
    • navigate: to navigate by clicking a specific sequence of objects, by their text value; the action_parameters parameter represents the > separated navigation path
    • scroll_to: to scroll to the specific element
    • empty_value: to empty the value property of the DOM object
    • store_text: to store data on the returning object generated by the run_process method; the action_parameters parameter represents the name of the property on the object
    • send_keys: to send a key input to a specific DOM object
    • select: to select a specific value of a specific combo-box DOM object, where the value is specified in the action_parameters parameter
    • foreach: to loop on all the DOM objects retrieved to execute repeated actions
  • context (optional): in case the foreach action is used, the context of all sub-items to be found will refer to the parent DOM object used in the loop; in this case, to consider the whole page, it is possible to specify whole_page as context

Sample usage

Get all repositories of @auino

# import the library
import seleniumprocessor

# define initial variables
URL_HOME = 'https://github.com/auino'
SLEEP_TO = 3

# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)

# define the process to be executed
p = [
	{'class_name':'UnderlineNav-item', 'index':1, 'action':'click', 'sleep':SLEEP_TO}, # clicking on the Repository tab, the second one, on top of the page
	{'class_name':'source', 'action':'foreach', 'action_parameters':[ # looping on all repositories
		{'class_name':'wb-break-all', 'action':'store_text', 'action_parameters':'name'}, # storing the repository name
		{'class_name':'color-text-secondary', 'action':'store_text', 'action_parameters':'description'} # storing the repository description
	]}
]

# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)

# showing resulting data
print(data)

Get all publications of a given user from Google Scholar

import seleniumprocessor

# define initial variables
USERPROFILE = 'UlbGEQwAAAAJ'
URL_HOME = 'https://scholar.google.com/citations?user={}'.format(USERPROFILE)
SLEEP_TO = 3

# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)

# define the process to be executed
p = [
    {'id':'gsc_prf_in', 'action':'store_text', 'action_parameters':'name'}, # storing researcher's name
    {'class_name':'gs_lbl', 'index':-1, 'action':'click-repeated', 'action_parameters':'gsc_a_tr', 'sleep':SLEEP_TO}, # clicking the button at the end of the page, to extend the list of publications
    {'class_name':'gsc_a_tr', 'action':'foreach', 'action_parameters':[ # looping on all publications
        {'class_name':'gsc_a_at', 'action':'store_text', 'action_parameters':'title'}, # storing the publication name
        {'class_name':'gs_gray', 'index':0, 'action':'store_text', 'action_parameters':'authors'}, # storing the authors of the publication
        {'class_name':'gs_gray', 'index':1, 'action':'store_text', 'action_parameters':'venue'}, # storing the venue of the publication
        {'class_name':'gsc_a_ac', 'action':'store_text', 'action_parameters':'citations'}, # storing the number of citations of the publication
        {'class_name':'gsc_a_h', 'action':'store_text', 'action_parameters':'year'}, # storing the year of the publication
    ]}
]

# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)

# showing resulting data
print(data)

TODO

  • Improve code readability
  • Extend supported objects structure

Contacts

You can find me on Twitter as @auino.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seleniumprocessor-0.1.5.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seleniumprocessor-0.1.5-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file seleniumprocessor-0.1.5.tar.gz.

File metadata

  • Download URL: seleniumprocessor-0.1.5.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for seleniumprocessor-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f2fc2fde904077aaa41ab9ae11be5e88488010ccf965b52c4e3b5273828eae8d
MD5 bf72d5fdf33fbb2a74e35ea811d57c07
BLAKE2b-256 8e342ada0404cdb67aed944a9b7098cef2737c5412cf9f9d71c1e3ab5824f4f4

See more details on using hashes here.

File details

Details for the file seleniumprocessor-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: seleniumprocessor-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for seleniumprocessor-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6350e977b3b4abf4830567fd2974d3c790fc316e1ad580ac48adceed3f65945c
MD5 9c109ac86a6f7efd8c58fe6d1dd44979
BLAKE2b-256 3982dbd39b6d69a3141220b79074a19fb56a68b761a63f51f0eee216d339d6a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page