A simple library designed for rapid bot and scraper development using Selenium and the POM (Page Object Model) design.

These details have not been verified by PyPI

Project description

fastbots

Fastbots is a simple library designed for rapid bot and scraper development using Selenium and the POM (Page Object Model) design.
It enhances productivity by allowing developers to focus solely on scraping, reducing boilerplate code, and eliminating the need for direct driver management-related code, thanks to browser-independent settings.
Even if site locators change, this library doesn't require modifications to the code; adjustments can be made solely in the configuration.

fastbots is also fully compatible with all selenium functions, refer to selenium official documentation for more details.

Installation

The installation process is straightforward using pip from the PyPI repository.

pip install fastbots

Showcase

Check out the full example at the: cookiecutter-fastbots.

Main Code

Here's the main code example:

-- main.py
# Import the logging module to handle logging in the script
import logging

# Import necessary classes and modules from the fastbots library
from fastbots import Task, Bot, Page, Payload, EC, WebElement, Keys, ActionChains, Select, Alert, TimeoutException, NoSuchElementException

# Define a ProductPage class, which is a subclass of the Page class
class ProductPage(Page):

    # Constructor to initialize the ProductPage instance
    # The page_name is used in the locators file; default is 'product_page'
    def __init__(self, bot: Bot, page_name: str = 'product_page'): 
        super().__init__(bot, page_name)

    # Define the forward method for navigating to the next page
    def forward(self) -> None:
        # Log information about the current action
        logging.info('DO THINGS')

        # Use locators specified in the file for flexibility and less code changes
        # name_element: WebElement = self.bot.driver.find_element(*self.__locator__('name_locator'))
        #name_element: WebElement = self.bot.wait.until(EC.element_to_be_clickable(self.__locator__('name_locator')))
        
        # Store data in the payload section for future retrieval on success
        #self.bot.payload.input_data['element_name'] = name_element.text

        # example of downloading the product png images and rename it (check download folder settings)
        # name_element.click() for example on element download button
        # self.bot.wait_downloaded_file_path("png", new_name_file=self.bot.payload.input_data['element_name'])
        # it will put the download path in the payload.downloads datastore class when downloaded and renamed

        # End the chain of page interactions
        return None

# Define a SearchPage class, which is a subclass of the Page class
class SearchPage(Page):

    # Constructor to initialize the SearchPage instance
    # The page_name is used in the locators file; default is 'search_page'
    def __init__(self, bot: Bot, page_name: str = 'search_page'):
        super().__init__(bot, page_name)

    # Define the forward method for navigating to the next page (ProductPage)
    def forward(self) -> ProductPage:
        # Log information about the current action
        logging.info('DO THINGS')

        # Use locators specified in the file for flexibility and less code changes
        search_element: WebElement = self.bot.wait.until(EC.element_to_be_clickable(self.__locator__('search_locator')))
        
        # Enter a search query and submit (using the loaded data in the task)
        search_element.send_keys(self.bot.payload.input_data['element_name'])
        search_element.send_keys(Keys.ENTER)

        # Locate the product element and click on it
        #product_element: WebElement = self.bot.wait.until(EC.element_to_be_clickable(self.__locator__('product_locator')))
        #product_element.click()

        # Continue the chain of interaction on the next page (ProductPage)
        return ProductPage(bot=self.bot)

# Define a TestTask class, which is a subclass of the Task class
class TestTask(Task):

    # Main task code to be executed when running the script
    def run(self, bot: Bot) -> bool:
        # Log information about the current action
        logging.info('DO THINGS')

        # load all needed data in the pages interactions (es. login password loaded from a file using pandas)
        bot.payload.input_data = {'username': 'test', 'password': 'test', 'element_name': 'My book'}

        # Open the search page, perform actions, and go forward
        page: Page = SearchPage(bot=bot)

        # For every page found, perform actions and go forward
        while page:
            page = page.forward()

        # For default, the task will succeed
        return True

    # Method executed on bot success, with its payload
    def on_success(self, payload: Payload):
        logging.info(f'SUCCESS {payload.downloads}')
    
    # Method executed on bot failure
    def on_failure(self, payload: Payload):
        logging.info(f'FAILED {payload.output_data}')

# Check if the script is executed as the main program
if __name__ == '__main__':
    # Start the above TestTask
    TestTask()()

Locators File

In the locators configuration file, all required locator configurations are defined. This can be easily changed without rebuilding or making modifications to the code.

-- locators.ini
[pages_url] # pages_url required url settings
start_url=https://www.amazon.com/ #start_url it's the first page driver.get() could be also None
search_page=https://www.amazon.com/ #*_page it's the first page url used for the page_name parameter with it's url that need to match
product_page=None#Used to skip the page_url check of the current url on a single page

[search_page] #*_page first page_name parameter, with it's related locators
search_locator=(By.ID, "twotabsearchtextbox")
product_locator=(By.XPATH, '//*[@id="search"]/div[1]/div[1]/div/span[1]/div[1]/div[2]')

[product_page]#*_page second page_name parameter, with it's related locators
name_locator=(By.ID, "title")

Settings

Browser and Drivers (Optional)

For default configuration, the selected browser is Firefox, but it could be changed from the config file:

-- settings.ini
[settings]
#BOT_DRIVER_TYPE=FIREFOX
BOT_DRIVER_TYPE=CHROME

The correct browser installed for the driver selected is required. The browser installation path is autodetected by system environment variables, and the driver download process and its related installation path settings are managed automatically.

Retry and Debug (Optional)

By default, every task will be retried 2 times, waiting for 10 seconds. If all two attempts fail, the task executes the on_error method; otherwise, it will execute the on_success method. This behavior could be modified in the settings file: This behaviour could be modified in the settings file:

-- settings.ini
[settings]
BOT_MAX_RETRIES=2 #sec default
BOT_RETRY_DELAY=10 #sec default

When the task fails, the library stores the screenshot and the HTML of the page in the debug folder, useful for debugging. It will also store all the logs in the log.log file.

Page Url Check (Automatic)

Every defined page must have a page URL, and when it's instantiated and reached by the bot, the library checks that the specified URL in the config matches the reached page during navigation to reduce navigation errors. If you want to disable this function, see the Global Wait Section below. There is also the possibility to change the page_url check from strict_page_url (exact match), with the current url that need to contains the page url, setting strict_page_url=False, in the page init method after the page name.

File Download Wait (Functions)

This library has the bot.wait_downloaded_file_path(file_extension, new_name_file=None) method that could be used after a button download click to wait and get the path of the downloaded file. It will also give the ability to rename the file. The extension is used to check that the downloaded file is correct and not corrupted. It's the default behaviour, all downloaded file need to be waited to be moved to download folder, to change this, disable strict download wait in the config, see the next section.

Download Folder and other Folders (Optional)

-- settings.ini
[settings]
BOT_DOWNLOAD_FOLDER_PATH='/usr/...' #override the default download path used for the browser
BOT_SCREENSHOT_DOWNLOAD_FOLDER_PATH='/debug' # default
BOT_HTML_DOWNLOAD_FOLDER_PATH='/debug'

BOT_STRICT_DOWNLOAD_WAIT=True #default, False -> all the downloaded file are move to download folder always without wait check

Global Wait (Optional)

The default configured waits are shown below:

The implicit wait used for initial page loading.
The wait for the URL check that matches the specified in the locators file.
The default wait used by the self.bot.wait function.

-- settings.ini
[settings]
SELENIUM_GLOBAL_IMPLICIT_WAIT=5 #sec default
SELENIUM_EXPECTED_URL_TIMEOUT=5 #sec default
SELENIUM_DEFAULT_WAIT=5 #sec default
SELENIUM_FILE_DOWNLOAD_TIMEOUT=20 #sec default

SELENIUM_EXPECTED_URL_CHECK=False #disable the automatic page url check, the default value it's True

Proxy (Optional)

Configure the proxy settings.

-- settings.ini
[settings]
BOT_PROXY_ENABLED=True
BOT_HTTP_PROXY=127.0.0.1:8080
BOT_HTTPS_PROXY=127.0.0.1:8080

User Agent (Optional)

Configure the user agent used for the requests.

-- settings.ini
[settings]
BOT_USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"

Arguments (Optional)

Configure Firefox Arguments, store them in the config file. The format is the same for all the supported drivers; check carefully that the exact arg is implemented for the selected driver.

Firefox args

-- settings.ini
[settings]
BOT_ARGUMENTS="--headless, --disable-gpu, -profile ./selenium"

Chrome args

-- settings.ini
[settings]
BOT_ARGUMENTS="--no-sandbox, --user-data-dir=./selenium, --profile-directory=selenium"

Store Preferences (Optional)

Store preferences in a JSON file, the format is the same for all the supported drivers; check carefully that the exact string and value are implemented for the selected driver.

Firefox prefs

-- preferences.json 
{
    "browser.download.manager.showWhenStarting": false, # Don't show download
    "browser.helperApps.neverAsk.saveToDisk": "application/pdf", # Automatic save PDF files
    "pdfjs.disabled": true  # Don't show the pdf
}

Chrome prefs

-- preferences.json 
{
    "profile.default_content_setting_values.notifications": 2,  # Disable notifications
    "profile.default_content_settings.popups": 0  # Allow popups
}

References

Fastbots docs

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.7

Apr 14, 2024

0.2.6

Jan 5, 2024

0.2.5

Dec 22, 2023

0.2.4

Dec 22, 2023

0.2.3

Dec 18, 2023

This version

0.2.2

Dec 17, 2023

0.2.1

Dec 14, 2023

0.2.0

Dec 11, 2023

0.1.9

Dec 6, 2023

0.1.8

Dec 5, 2023

0.1.7

Nov 24, 2023

0.1.6

Nov 23, 2023

0.1.5

Nov 20, 2023

0.1.4

Nov 20, 2023

0.1.3

Nov 19, 2023

0.1.2

Nov 17, 2023

0.1.1

Nov 17, 2023

0.1.0

Nov 16, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastbots-0.2.2.tar.gz (21.4 kB view hashes)

Uploaded Dec 17, 2023 Source

Built Distribution

fastbots-0.2.2-py3-none-any.whl (22.5 kB view hashes)

Uploaded Dec 17, 2023 Python 3

Hashes for fastbots-0.2.2.tar.gz

Hashes for fastbots-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`1f9b74e2d15e918098ad5fb68b4153776ddb142dc9f2538dbe75d559362df071`
MD5	`d049c4d64ccb216bae824fac55859a50`
BLAKE2b-256	`dbca9d092831ff366867dd89184b5e6e893a2d55cf11f448713c5d645229bf82`

Hashes for fastbots-0.2.2-py3-none-any.whl

Hashes for fastbots-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a00b883fb88c2ce8eff6b7ad8138ca9b612bd3e2807a7c88af18caff393b0d94`
MD5	`8edca329e1faec052ed2fc3e96ed65e2`
BLAKE2b-256	`dca8af1754534e1f09ae08073541d4fcf37089024455736fcbc31e1befd586cc`