A simple library to set up Selenium processes
Project description
seleniumprocessor
A simple library to set up Selenium processes
Description
This library allows you to easily set up a process based on Selenium. Thanks to the use of a specific format, it is possible to easily define processes to be passed to Selenium.
Installation
pip install seleniumprocessor
Install a Selenium web driver, e.g., the Chrome WebDriver
Available methods
initiate_connection(webdriverfile, url, to, loginrequired=True, headless=False), returning a selenium.webdriver.chrome.webdriver.WebDriver object allowing browser control
webdriverfileis the path of the Selenium web driver fileurlis the url to opentois the timeout to wait, regarding page loadingloginrequiredspecifies if a manual login from the user is required (True) or not (False)headlessspecifies if the browser has to be executed in headless mode (True) or not (False)
run_process(brw, url_home, to, p, backtohome_begin=True, backtohome_end=True, checkfilterpassed_callback=None), returning an object, as specified in the process p
brwtheselenium.webdriver.chrome.webdriver.WebDriverobject used to control the browserurl_homethe home page urltothe timeout used to wait the home page loadpthe list of actions in the current processbacktohome_beginspecifies if the browser should be redirected to the home page at begin of the method (True) or not (False)backtohome_endspecifies if the browser should be redirected to the home page at end of the method (True) or not (False)checkfilterpassed_callbackidentifies a callback function used to check filters defined in the processp, returing a boolean value (Trueif the filter is passed,Falseotherwise)
Objects structure
The main process object is a list of actions to sequentially execute on the process. Each action is represented by an array map with the following fields:
name: the name identifying the DOM objects to findclass_name: the class name identifying the DOM objects to findindex(optional): in case of multiple DOM objects with the same class (or in case a DOM object which is not the first one has to be considered), it is possible to specify the index of the DOM object, in the list of DOM objects using the same classsleep(optional): the sleep timeout used after the action is performedfilter: a string passed to thecheckfilterpassed_callbackfor filtering actionsaction_parameters(optional): its definition depends on theactionfieldaction: the action to execute:click: to perform a click on the DOM objectclick-repeated: to perform a repeated click on the DOM object, until the object is present (useful withsleep, e.g., for pages loading portions of a lists, with a final button to load additional results); the optionalaction_parametersparameter represents the class name of the objects to count: when the object is unchanged, repeated clicks will be interruptednavigate: to navigate by clicking a specific sequence of objects, by their text value; theaction_parametersparameter represents the>separated navigation pathscroll_to: to scroll to the specific elementempty_value: to empty thevalueproperty of the DOM objectstore_text: to store data on the returning object generated by therun_processmethod; theaction_parametersparameter represents the name of the property on the objectsend_keys: to send a key input to a specific DOM objectselect: to select a specific value of a specific combo-box DOM object, where the value is specified in theaction_parametersparameterforeach: to loop on all the DOM objects retrieved to execute repeated actions
context(optional): in case theforeachaction is used, the context of all sub-items to be found will refer to the parent DOM object used in the loop; in this case, to consider the whole page, it is possible to specifywhole_pageascontext
Sample usage
Get all repositories of @auino
# import the library
import seleniumprocessor
# define initial variables
URL_HOME = 'https://github.com/auino'
SLEEP_TO = 3
# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)
# define the process to be executed
p = [
{'class_name':'UnderlineNav-item', 'index':1, 'action':'click', 'sleep':SLEEP_TO}, # clicking on the Repository tab, the second one, on top of the page
{'class_name':'source', 'action':'foreach', 'action_parameters':[ # looping on all repositories
{'class_name':'wb-break-all', 'action':'store_text', 'action_parameters':'name'}, # storing the repository name
{'class_name':'color-text-secondary', 'action':'store_text', 'action_parameters':'description'} # storing the repository description
]}
]
# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)
# showing resulting data
print(data)
Get all publications of a given user from Google Scholar
import seleniumprocessor
# define initial variables
USERPROFILE = 'UlbGEQwAAAAJ'
URL_HOME = 'https://scholar.google.com/citations?user={}'.format(USERPROFILE)
SLEEP_TO = 3
# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)
# define the process to be executed
p = [
{'id':'gsc_prf_in', 'action':'store_text', 'action_parameters':'name'}, # storing researcher's name
{'class_name':'gs_lbl', 'index':-1, 'action':'click-repeated', 'action_parameters':'gsc_a_tr', 'sleep':SLEEP_TO}, # clicking the button at the end of the page, to extend the list of publications
{'class_name':'gsc_a_tr', 'action':'foreach', 'action_parameters':[ # looping on all publications
{'class_name':'gsc_a_at', 'action':'store_text', 'action_parameters':'title'}, # storing the publication name
{'class_name':'gs_gray', 'index':0, 'action':'store_text', 'action_parameters':'authors'}, # storing the authors of the publication
{'class_name':'gs_gray', 'index':1, 'action':'store_text', 'action_parameters':'venue'}, # storing the venue of the publication
{'class_name':'gsc_a_ac', 'action':'store_text', 'action_parameters':'citations'}, # storing the number of citations of the publication
{'class_name':'gsc_a_h', 'action':'store_text', 'action_parameters':'year'}, # storing the year of the publication
]}
]
# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)
# showing resulting data
print(data)
TODO
- Improve code readability
- Extend supported objects structure
Contacts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seleniumprocessor-0.1.5.tar.gz.
File metadata
- Download URL: seleniumprocessor-0.1.5.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2fc2fde904077aaa41ab9ae11be5e88488010ccf965b52c4e3b5273828eae8d
|
|
| MD5 |
bf72d5fdf33fbb2a74e35ea811d57c07
|
|
| BLAKE2b-256 |
8e342ada0404cdb67aed944a9b7098cef2737c5412cf9f9d71c1e3ab5824f4f4
|
File details
Details for the file seleniumprocessor-0.1.5-py3-none-any.whl.
File metadata
- Download URL: seleniumprocessor-0.1.5-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6350e977b3b4abf4830567fd2974d3c790fc316e1ad580ac48adceed3f65945c
|
|
| MD5 |
9c109ac86a6f7efd8c58fe6d1dd44979
|
|
| BLAKE2b-256 |
3982dbd39b6d69a3141220b79074a19fb56a68b761a63f51f0eee216d339d6a0
|