A simple library to set up Selenium processes

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

seleniumprocessor

A simple library to set up Selenium processes

Description

This library allows you to easily set up a process based on Selenium. Thanks to the use of a specific format, it is possible to easily define processes to be passed to Selenium.

Installation

pip install seleniumprocessor

Install a Selenium web driver, e.g., the Chrome WebDriver

Available methods

initiate_connection(webdriverfile, url, to, loginrequired=True, headless=False), returning a selenium.webdriver.chrome.webdriver.WebDriver object allowing browser control

webdriverfile is the path of the Selenium web driver file
url is the url to open
to is the timeout to wait, regarding page loading
loginrequired specifies if a manual login from the user is required (True) or not (False)
headless specifies if the browser has to be executed in headless mode (True) or not (False)

run_process(brw, url_home, to, p, backtohome_begin=True, backtohome_end=True, checkfilterpassed_callback=None), returning an object, as specified in the process p

brw the selenium.webdriver.chrome.webdriver.WebDriver object used to control the browser
url_home the home page url
to the timeout used to wait the home page load
p the list of actions in the current process
backtohome_begin specifies if the browser should be redirected to the home page at begin of the method (True) or not (False)
backtohome_end specifies if the browser should be redirected to the home page at end of the method (True) or not (False)
checkfilterpassed_callback identifies a callback function used to check filters defined in the process p, returing a boolean value (True if the filter is passed, False otherwise)

Objects structure

The main process object is a list of actions to sequentially execute on the process. Each action is represented by an array map with the following fields:

name: the name identifying the DOM objects to find
class_name: the class name identifying the DOM objects to find
index (optional): in case of multiple DOM objects with the same class (or in case a DOM object which is not the first one has to be considered), it is possible to specify the index of the DOM object, in the list of DOM objects using the same class
sleep (optional): the sleep timeout used after the action is performed
filter: a string passed to the checkfilterpassed_callback for filtering actions
action_parameters (optional): its definition depends on the action field
action: the action to execute:
- click: to perform a click on the DOM object
- click-repeated: to perform a repeated click on the DOM object, until the object is present (useful with sleep, e.g., for pages loading portions of a lists, with a final button to load additional results); the optional action_parameters parameter represents the class name of the objects to count: when the object is unchanged, repeated clicks will be interrupted
- navigate: to navigate by clicking a specific sequence of objects, by their text value; the action_parameters parameter represents the > separated navigation path
- scroll_to: to scroll to the specific element
- empty_value: to empty the value property of the DOM object
- store_text: to store data on the returning object generated by the run_process method; the action_parameters parameter represents the name of the property on the object
- send_keys: to send a key input to a specific DOM object
- select: to select a specific value of a specific combo-box DOM object, where the value is specified in the action_parameters parameter
- foreach: to loop on all the DOM objects retrieved to execute repeated actions
context (optional): in case the foreach action is used, the context of all sub-items to be found will refer to the parent DOM object used in the loop; in this case, to consider the whole page, it is possible to specify whole_page as context

Sample usage

Get all repositories of @auino

# import the library
import seleniumprocessor

# define initial variables
URL_HOME = 'https://github.com/auino'
SLEEP_TO = 3

# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)

# define the process to be executed
p = [
	{'class_name':'UnderlineNav-item', 'index':1, 'action':'click', 'sleep':SLEEP_TO}, # clicking on the Repository tab, the second one, on top of the page
	{'class_name':'source', 'action':'foreach', 'action_parameters':[ # looping on all repositories
		{'class_name':'wb-break-all', 'action':'store_text', 'action_parameters':'name'}, # storing the repository name
		{'class_name':'color-text-secondary', 'action':'store_text', 'action_parameters':'description'} # storing the repository description
	]}
]

# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)

# showing resulting data
print(data)

Get all publications of a given user from Google Scholar

import seleniumprocessor

# define initial variables
USERPROFILE = 'UlbGEQwAAAAJ'
URL_HOME = 'https://scholar.google.com/citations?user={}'.format(USERPROFILE)
SLEEP_TO = 3

# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)

# define the process to be executed
p = [
    {'id':'gsc_prf_in', 'action':'store_text', 'action_parameters':'name'}, # storing researcher's name
    {'class_name':'gs_lbl', 'index':-1, 'action':'click-repeated', 'action_parameters':'gsc_a_tr', 'sleep':SLEEP_TO}, # clicking the button at the end of the page, to extend the list of publications
    {'class_name':'gsc_a_tr', 'action':'foreach', 'action_parameters':[ # looping on all publications
        {'class_name':'gsc_a_at', 'action':'store_text', 'action_parameters':'title'}, # storing the publication name
        {'class_name':'gs_gray', 'index':0, 'action':'store_text', 'action_parameters':'authors'}, # storing the authors of the publication
        {'class_name':'gs_gray', 'index':1, 'action':'store_text', 'action_parameters':'venue'}, # storing the venue of the publication
        {'class_name':'gsc_a_ac', 'action':'store_text', 'action_parameters':'citations'}, # storing the number of citations of the publication
        {'class_name':'gsc_a_h', 'action':'store_text', 'action_parameters':'year'}, # storing the year of the publication
    ]}
]

# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)

# showing resulting data
print(data)

TODO

Improve code readability
Extend supported objects structure

Contacts

You can find me on Twitter as @auino.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.5

Feb 17, 2022

0.1.4

Feb 17, 2022

0.1.3

Sep 30, 2021

0.1.2

Aug 26, 2021

0.1.1

Aug 25, 2021

0.1.0

Aug 25, 2021

0.0.8

Aug 25, 2021

0.0.7

Aug 25, 2021

0.0.6

Aug 20, 2021

0.0.5

Aug 20, 2021

0.0.4

Aug 20, 2021

0.0.3

Aug 20, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seleniumprocessor-0.1.5.tar.gz (4.8 kB view hashes)

Uploaded Feb 17, 2022 Source

Built Distribution

seleniumprocessor-0.1.5-py3-none-any.whl (5.1 kB view hashes)

Uploaded Feb 17, 2022 Python 3

Hashes for seleniumprocessor-0.1.5.tar.gz

Hashes for seleniumprocessor-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`f2fc2fde904077aaa41ab9ae11be5e88488010ccf965b52c4e3b5273828eae8d`
MD5	`bf72d5fdf33fbb2a74e35ea811d57c07`
BLAKE2b-256	`8e342ada0404cdb67aed944a9b7098cef2737c5412cf9f9d71c1e3ab5824f4f4`

Hashes for seleniumprocessor-0.1.5-py3-none-any.whl

Hashes for seleniumprocessor-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6350e977b3b4abf4830567fd2974d3c790fc316e1ad580ac48adceed3f65945c`
MD5	`9c109ac86a6f7efd8c58fe6d1dd44979`
BLAKE2b-256	`3982dbd39b6d69a3141220b79074a19fb56a68b761a63f51f0eee216d339d6a0`