A simple library to set up Selenium processes
Project description
seleniumprocessor
A simple library to set up Selenium processes
Description
This library allows you to easily set up a process based on Selenium. Thanks to the use of a specific format, it is possible to easily define processes to be passed to Selenium.
Installation
pip install seleniumprocessor
Install a Selenium web driver, e.g., the Chrome WebDriver
Available methods
initiate_connection(webdriverfile, url, to, loginrequired=True, headless=False)
, returning a selenium.webdriver.chrome.webdriver.WebDriver
object allowing browser control
webdriverfile
is the path of the Selenium web driver fileurl
is the url to opento
is the timeout to wait, regarding page loadingloginrequired
specifies if a manual login from the user is required (True
) or not (False
)headless
specifies if the browser has to be executed in headless mode (True
) or not (False
)
run_process(brw, url_home, to, p, backtohome_begin=True, backtohome_end=True, checkfilterpassed_callback=None)
, returning an object, as specified in the process p
brw
theselenium.webdriver.chrome.webdriver.WebDriver
object used to control the browserurl_home
the home page urlto
the timeout used to wait the home page loadp
the list of actions in the current processbacktohome_begin
specifies if the browser should be redirected to the home page at begin of the method (True
) or not (False
)backtohome_end
specifies if the browser should be redirected to the home page at end of the method (True
) or not (False
)checkfilterpassed_callback
identifies a callback function used to check filters defined in the processp
, returing a boolean value (True
if the filter is passed,False
otherwise)
Objects structure
The main process object is a list of actions to sequentially execute on the process. Each action is represented by an array map with the following fields:
name
: the name identifying the DOM objects to findclass_name
: the class name identifying the DOM objects to findindex
(optional): in case of multiple DOM objects with the same class (or in case a DOM object which is not the first one has to be considered), it is possible to specify the index of the DOM object, in the list of DOM objects using the same classsleep
(optional): the sleep timeout used after the action is performedfilter
: a string passed to thecheckfilterpassed_callback
for filtering actionsaction_parameters
(optional): its definition depends on theaction
fieldaction
: the action to execute:click
: to perform a click on the DOM objectclick-repeated
: to perform a repeated click on the DOM object, until the object is present (useful withsleep
, e.g., for pages loading portions of a lists, with a final button to load additional results); the optionalaction_parameters
parameter represents the class name of the objects to count: when the object is unchanged, repeated clicks will be interruptednavigate
: to navigate by clicking a specific sequence of objects, by their text value; theaction_parameters
parameter represents the>
separated navigation pathscroll_to
: to scroll to the specific elementempty_value
: to empty thevalue
property of the DOM objectstore_text
: to store data on the returning object generated by therun_process
method; theaction_parameters
parameter represents the name of the property on the objectsend_keys
: to send a key input to a specific DOM objectselect
: to select a specific value of a specific combo-box DOM object, where the value is specified in theaction_parameters
parameterforeach
: to loop on all the DOM objects retrieved to execute repeated actions
context
(optional): in case theforeach
action is used, the context of all sub-items to be found will refer to the parent DOM object used in the loop; in this case, to consider the whole page, it is possible to specifywhole_page
ascontext
Sample usage
Get all repositories of @auino
# import the library
import seleniumprocessor
# define initial variables
URL_HOME = 'https://github.com/auino'
SLEEP_TO = 3
# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)
# define the process to be executed
p = [
{'class_name':'UnderlineNav-item', 'index':1, 'action':'click', 'sleep':SLEEP_TO}, # clicking on the Repository tab, the second one, on top of the page
{'class_name':'source', 'action':'foreach', 'action_parameters':[ # looping on all repositories
{'class_name':'wb-break-all', 'action':'store_text', 'action_parameters':'name'}, # storing the repository name
{'class_name':'color-text-secondary', 'action':'store_text', 'action_parameters':'description'} # storing the repository description
]}
]
# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)
# showing resulting data
print(data)
Get all publications of a given user from Google Scholar
import seleniumprocessor
# define initial variables
USERPROFILE = 'UlbGEQwAAAAJ'
URL_HOME = 'https://scholar.google.com/citations?user={}'.format(USERPROFILE)
SLEEP_TO = 3
# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)
# define the process to be executed
p = [
{'id':'gsc_prf_in', 'action':'store_text', 'action_parameters':'name'}, # storing researcher's name
{'class_name':'gs_lbl', 'index':-1, 'action':'click-repeated', 'action_parameters':'gsc_a_tr', 'sleep':SLEEP_TO}, # clicking the button at the end of the page, to extend the list of publications
{'class_name':'gsc_a_tr', 'action':'foreach', 'action_parameters':[ # looping on all publications
{'class_name':'gsc_a_at', 'action':'store_text', 'action_parameters':'title'}, # storing the publication name
{'class_name':'gs_gray', 'index':0, 'action':'store_text', 'action_parameters':'authors'}, # storing the authors of the publication
{'class_name':'gs_gray', 'index':1, 'action':'store_text', 'action_parameters':'venue'}, # storing the venue of the publication
{'class_name':'gsc_a_ac', 'action':'store_text', 'action_parameters':'citations'}, # storing the number of citations of the publication
{'class_name':'gsc_a_h', 'action':'store_text', 'action_parameters':'year'}, # storing the year of the publication
]}
]
# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)
# showing resulting data
print(data)
TODO
- Improve code readability
- Extend supported objects structure
Contacts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for seleniumprocessor-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6350e977b3b4abf4830567fd2974d3c790fc316e1ad580ac48adceed3f65945c |
|
MD5 | 9c109ac86a6f7efd8c58fe6d1dd44979 |
|
BLAKE2b-256 | 3982dbd39b6d69a3141220b79074a19fb56a68b761a63f51f0eee216d339d6a0 |