Skip to main content

Like WebDriverWait - visibility_of_element_located, but much easier

Project description

Like WebDriverWait - visibility_of_element_located, but much easier

pip install selenium2df-locate-element

Tested against Windows 10 / Python 3.10 / Anaconda 3

This module simplifies the process of locating and extracting elements from web pages using Selenium, while leveraging the power and flexibility of Pandas DataFrames. It saves time and effort by providing a streamlined approach to web data extraction and facilitates further data processing and analysis tasks.

Advantages of using the locate_element function

Simplified element location:

The function abstracts away the complexities of locating elements using CSS selectors in Selenium. You can specify a CSS selector query to target the desired elements without dealing with low-level Selenium operations.

Conversion to Pandas DataFrame:

The located elements are converted into a Pandas DataFrame, providing a convenient data structure for data manipulation and analysis. This allows you to leverage Pandas' extensive functionality for working with tabular data.

Support for conditional modification:

The function accepts a checkdf parameter, which allows you to provide a callable function for modifying the DataFrame based on specific conditions. This can be useful when you want to filter or transform the located elements based on custom criteria.

Timeout handling:

The function includes a timeout mechanism to wait for the desired elements to appear on the page. You can specify the maximum time to wait, and the function will continue to retry until the timeout is reached or the elements are located. This helps handle scenarios where elements may take some time to load or appear dynamically on the page.

    Locates elements on a web page using Selenium and converts them into a Pandas DataFrame.

    Args:
        checkdf: A callable function that takes a Pandas DataFrame as input and returns a modified DataFrame based on certain conditions. Defaults to None.
        query: A string representing the CSS selector query for the desired elements. Defaults to "*".
        timeout: The maximum time in seconds to wait for the desired elements. Defaults to 30.
        withmethods: A boolean indicating whether to include methods for interacting with the elements in the resulting DataFrame. Defaults to True.

    Returns:
        If checkdf is None, returns a single Pandas DataFrame containing the located elements.
        If checkdf is provided, returns a tuple of two Pandas DataFrames: the first one contains all located elements, and the second one contains the modified DataFrame returned by checkdf.

import undetected_chromedriver as uc
from selenium2df_locate_element import selenium2dfwait, locate_element

if __name__ == "__main__":
	driver = uc.Chrome()
	selenium2dfwait.driver = driver
	driver.get(r"https://testpages.herokuapp.com/styled/index.html")
	df3 = locate_element()
	df_all, df = locate_element(checkdf=lambda df:df.loc[df.aa_localName == 'a'], query='*', timeout=30, withmethods=True)
	print(df_all, df)

	print(df3)
										   element  ...                    aa_window_switch
0    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
1    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
2    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
3    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
4    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
..                                                 ...  ...                                 ...
181  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
182  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
183  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
184  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
185  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
[186 rows x 136 columns]
print(df)
										   element  ...                    aa_window_switch
15   <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
17   <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
19   <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
23   <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
25   <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
..                                                 ...  ...                                 ...
171  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
173  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
179  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
180  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
185  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
[66 rows x 136 columns]
print(df_all)
										   element  ...                    aa_window_switch
0    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
1    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
2    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
3    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
4    <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
..                                                 ...  ...                                 ...
181  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
182  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
183  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
184  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
185  <undetected_chromedriver.webelement.WebElement...  ...  0A3ADC1FAEED9B036EB136728D29F595()
[186 rows x 136 columns]

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selenium2df_locate_element-0.10.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file selenium2df_locate_element-0.10.tar.gz.

File metadata

File hashes

Hashes for selenium2df_locate_element-0.10.tar.gz
Algorithm Hash digest
SHA256 9180a3386359aae90fecc5518f60e24e5df5ad8636b5d7e5e6f8c857d4cbebdf
MD5 8d7dac9b4f41a25af481323f1f5e6a91
BLAKE2b-256 32c77334e21ee20d0759db8f498c988b2189dd7788c9d63c2c25f4025ba103e6

See more details on using hashes here.

File details

Details for the file selenium2df_locate_element-0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for selenium2df_locate_element-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 196de23fbc37975c3f08b75fec43d4c54fc0646e405459c78fcd005d139d49ef
MD5 b54f13dc6860d7160cb6438999f883ca
BLAKE2b-256 a7edfcb61ff4da618428e3887de75e2459da893f619e59cdb7018dfebc620e27

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page