A high-performance async web scraping and automation framework using Selenium.

These details have not been verified by PyPI

Project links

Project description

selenium_swift

selenium_swift is a powerful Python package designed to accelerate and simplify web scraping tasks using Selenium. With a focus on speed, accuracy, and ease of use, selenium_swift offers advanced features that cater to both beginners and experienced developers.

Key Features

Advanced Element Handling: Interact with web elements effortlessly using a high-level API. The Element class supports synchronous and asynchronous operations, making actions like clicking, sending keys, and capturing screenshots straightforward.
Frame Management: The Frame class makes working with iframes easier by providing methods to switch and focus on specific frames, ensuring precise element interactions within complex page structures.
Chrome Extension Integration: Use the ChromeExtension class to manage and interact with Chrome extensions directly within your scraping tasks.
Flexible WebDriver Options: Configure WebDriver settings with the WebOption class, including headless mode, proxy settings, and custom profiles. Tailor your WebDriver to suit specific scraping needs.
Automatic Driver Management: The WebService class handles WebDriver installations for Chrome, Firefox, and Edge browsers, leveraging webdriver-manager for seamless driver management.
Asynchronous and Synchronous Support: Choose between async programming with asyncio or traditional synchronous methods to optimize performance and flexibility.
User-Friendly API: Designed for simplicity and efficiency, selenium_swift abstracts complex Selenium operations, making web scraping accessible to beginners while offering powerful tools for advanced users.

Installation

Install selenium_swift from PyPI using pip:

pip install selenium-swift

Usage Example

Example 1:

Explanation

This example demonstrates how to handle interactions with pages that open as a result of an event (such as a click or key press) using a custom browser class built on top of ChromeBrowser. The code showcases how to find elements on a page, trigger events to open new pages, and interact with the newly opened pages asynchronously.

Key Concepts:

Browser Class: We define a class MyBrowser that extends from ChromeBrowser to customize browser behavior.
Async Tab Method: Methods that interact with browser tabs should be named with a tab prefix, which the framework recognizes as a tab interaction.
Page Navigation: The example shows how to load a page, find specific elements (in this case, product thumbnails), and handle page transitions when an event (like a click) triggers the opening of a new page.
Handling New Pages: After triggering an event that opens a new page, the script switches focus to the new page and interacts with its contents.

Example Code

from selenium_swift.browser import * 

class MyBrowser(ChromeBrowser):
    """
    MyBrowser extends ChromeBrowser to define custom interactions with web pages.
    This class demonstrates how to interact with elements on a page and handle events
    that open new browser tabs or windows.
    """
    
    def __init__(self) -> None:
        # Initialize the browser with Chrome-specific options and service
        super().__init__(ChromeOption(), ChromeService())
    
    async def tab_1(self):
        """
        This method opens a webpage and interacts with its elements. Specifically, it clicks on
        product thumbnails, which open new pages, and interacts with the newly opened page.
        """
        # Open the page at the specified URL
        page = await self.get('https://books.toscrape.com/')
        
        # Find all product elements in page
        products = await page.find_elements('css_selector', '.thumbnail')
        print(f"Found {len(products)} products.")
        
        # Loop through each product, click it to open a new page, and interact with the new page
        for prd in products:
            # Click the product, which opens a new page
            prd.click()
            
            # Switch focus to the newly opened page
            infoPage = page.focus_to_new_page()
            
            # Find the rows in the table on the new page using a CSS selector
            table_rows = await infoPage.find_elements('css_selector', 'table[class*="table-stripe"] tr')
            print("********** Table Content **********")
            
            # Loop through the table rows and print their text content
            for row in table_rows:
                print(row.text)
    
if __name__ == "__main__":
    # Start the browser with an instance of MyBrowser
    Browser.startBrowsers([MyBrowser()])

Breakdown of the Code

MyBrowser Class: This class inherits from ChromeBrowser. Inside, we define the tab_1 method to represent interactions on the first tab on window opened by the browser.
- The super().__init__(ChromeOption(), ChromeService()) ensures that the browser is initialized with default Chrome options and services.
tab_1 Method:
- This method loads the page https://books.toscrape.com/.
- It finds all product elements on the page using the CSS selector .thumbnail.
- For each product, the script clicks it, causing a new page to open.
- Once the new page is opened, the script switches focus to that page using focus_to_new_page().
- It then locates a table of data on the new page using a CSS selector, iterates through the rows, and prints the content of each row.
Browser Startup: The if __name__ == "__main__": block ensures that the script runs the browser when executed. It calls Browser.startBrowsers([MyBrowser()]) to start the browser and execute the interactions defined in tab_1.

Key Considerations

Async Interactions: The example utilizes async programming to handle potentially slow operations (like loading a page or finding elements) without blocking the main thread.
Scalability: You can extend this by adding more tab methods (e.g., tab_2, tab_3, etc.) to handle different interactions or pages.
Error Handling: In production environments, adding error handling (e.g., for timeouts or missing elements) is important for robustness.

This example is designed to demonstrate how to automate interaction with pages that open through events and how to interact with the newly opened page.

Example 2:

Explanation

This example demonstrates how to interact with new pages opened by an event (e.g., a click) using an object-oriented approach. Instead of directly focusing on a new page using focus_to_new_page(), we create a PageInfo class that extends NextPage, a base class designed for handling pages that are opened from other pages.

Key Concepts:

Page Class (PageInfo): This class inherits from NextPage and is used to represent and interact with pages that are opened by user interactions (like clicking on an element).
Separation of Concerns: Each page interaction is encapsulated within its own class, making the code modular and easier to maintain.
Async Page Interactions: The showData method in PageInfo asynchronously finds elements and displays their data, demonstrating how to interact with a newly opened page.

Example Code

from selenium_swift.browser import *  # Import base browser classes
from selenium_swift.web_option import ChromeOption  # Import Chrome options
from selenium_swift.web_service import ChromeService  # Import Chrome services

class PageInfo(NextPage):
    """
    PageInfo is a class that extends NextPage. It is used to handle the
    new page that opens after interacting with an element on the current page.
    This class encapsulates interactions with the new page.
    """
    def __init__(self) -> None:
        super().__init__()  # Initialize the NextPage base class

    async def showData(self):
        """
        This method finds table rows on the newly opened page and prints the content
        of each row. The data is located using a CSS selector.
        """
        # Locate the table rows using the CSS selector
        table_rows = await self.find_elements('css_selector', 'table[class*="table-stripe"] tr')
        
        # Print the content of each table row
        print("********** Table Content **********")
        for row in table_rows:
            print(row.text)

class MyBrowser(ChromeBrowser):
    """
    MyBrowser is a custom browser class that extends ChromeBrowser.
    It contains methods to interact with the main page and handle navigation
    to new pages.
    """
    def __init__(self) -> None:
        # Initialize ChromeBrowser with default Chrome options and services
        super().__init__(ChromeOption(), ChromeService())

    async def tab_1(self):
        """
        This method interacts with the first tab. It opens a webpage, locates product elements,
        and handles navigation to the new page when a product is clicked.
        """
        # Load the main page
        page = await self.get('https://books.toscrape.com/')
        
        # Find all product elements on the page
        products = await page.find_elements('css_selector', '.thumbnail')
        print(f"Found {len(products)} products.")
        
        # Loop through the products and handle interactions with the new page
        for prd in products:
            # Click the product, which opens a new page
            prd.click()

            # Create an instance of PageInfo to represent the new page
            # and interact with it using the showData method
            await PageInfo().showData()

if __name__ == "__main__":
    # Start the browser with an instance of MyBrowser and open the first tab
    Browser.startBrowsers([MyBrowser()])

Breakdown of the Code

PageInfo Class:
- This class extends NextPage, which is designed to represent a page that opens as a result of an interaction (like clicking on an element).
- The method showData asynchronously finds table rows using a CSS selector and prints the content of each row.
MyBrowser Class:
- This class extends ChromeBrowser and defines the tab_1 method for interactions on the main page.
- It opens the main page (https://books.toscrape.com/) and locates all product elements using the .thumbnail CSS selector.
- When a product is clicked, a new page opens. Instead of focusing directly on the new page, an instance of PageInfo is created, and the showData method is called to interact with the new page.
Browser Flow:
- The browser starts by opening the main page, where it finds and clicks on product elements.
- Each click opens a new page, which is handled by PageInfo. This class abstracts the interaction with the newly opened page, making the code cleaner and more modular.
Object-Oriented Design:
- By using a class (PageInfo) to represent the new page, you ensure that all interactions with that page are encapsulated in one place. This separation of concerns makes the code easier to maintain and extend.
- The base class NextPage can be extended further if more features need to be added, and PageInfo can be customized for specific interactions with different pages. This example shows how to manage page interactions using class inheritance, following best practices for code organization and readability.

Example 3:

This example shows how to use selenium_swift to scrape a web page. Follow these steps:

Create your own Scrap class that extends from the PageScrape class and contains the async def onResponse method that includes your arg.
Create a MyBrowser class that extends from ChromeBrowser, FirefoxBrowser, or EdgeBrowser. Here, I use ChromeBrowser. You should create async methods that begin with "tab", e.g., tab_1, tab_2, etc. Each tab method will open a tab in your browser.

from selenium_swift.browser import * 

class Scrap(PageScrape):
    async def onResponse(self, **arg):
        quote_elements = await self.find_elements('css_selector','.text')
        for quote in quote_elements:
            print(quote.text)

class MyBrowser(ChromeBrowser):
    def __init__(self) -> None:
        super().__init__(ChromeOption(), ChromeService())

    async def tab_1(self):
        for i in range(1, 3):
            await Scrap(f'https://quotes.toscrape.com/page/{i}/').crawl(my_index=i)

    async def tab_2(self):
        for i in range(3, 6):
            await Scrap(f'https://quotes.toscrape.com/page/{i}/').crawl(my_index=i)

    async def tab_3(self):
        for i in range(6, 9):
            await Scrap(f'https://quotes.toscrape.com/page/{i}/').crawl(my_index=i)

    async def tab_4(self):
        for i in range(9, 11):
            await Scrap(f'https://quotes.toscrape.com/page/{i}/').crawl(my_index=i)

if __name__ == "__main__":
    Browser.startBrowsers([MyBrowser()])

Example 4: Concurrent File Upload and Download

This example demonstrates how to concurrently upload and download files using the selenium_swift package with a custom browser class.

Step 1: Create the MyBrowser Class

In this step, we will create a class named MyBrowser for example , which extends from the ChromeBrowser class. This class will contain two asynchronous methods: tab_download and tab_upload. Each method will handle a specific functionality—downloading files and uploading files—by opening separate tabs in the browser.

from selenium_swift.browser import *

class MyBrowser(ChromeBrowser):
    def __init__(self) -> None:
        # Set the download directory
        self.path_download = r"c:\Users\progr\OneDrive\Bureau\test_download"
        option = ChromeOption('download.default_directory=' + self.path_download)
        super().__init__(option, ChromeService())

Initialization: The __init__ method sets the download directory for downloaded files using the ChromeOption class. This ensures that all downloaded files will be saved to the specified path.

Step 2: Implement the `tab_download` Method

The tab_download method will navigate to a page that contains downloadable files. It will identify links to PDF files and initiate the download process.

    async def tab_download(self):
        # Navigate to the download page
        page = await self.get('https://the-internet.herokuapp.com/download')
        link_list = await page.find_elements('css_selector', 'a')
        
        # Iterate through the links and click on those that end with '.pdf'
        for link in link_list:
            if link.text.endswith('.pdf'):
                link.click()
        
        # Wait for the download to complete (put this statment in the end of the tab)
        await page.wait_for_Download(self.path_download)

File Download Logic: The method retrieves all links on the page and checks if they end with the .pdf extension. If so, it clicks the link to start the download.
Waiting for Downloads: The await page.wait_for_Download(self.path_download) statement ensures that the method waits until the download is completed before browser close all the tabs.

Step 3: Implement the `tab_upload` Method

The tab_upload method will navigate to a file upload page, locate the file input element, and upload a specified file.

    async def tab_upload(self):
        # Navigate to the upload page
        page = await self.get('https://the-internet.herokuapp.com/upload')
        
        # Locate the file input element and upload a file
        input_file = await page.find_element('id', "file-upload")
        input_file.send_file(r'c:\Users\progr\Downloads\DATA_Data_Analysis_2_AR.pdf')
        
        # Optional: wait for a brief period to ensure the file is uploaded
        await page.sleep(3)

File Upload Logic: The method retrieves the file input element by its ID and uses the send_file method to upload a specified file from the local system.
Sleep Function: The await page.sleep(3) statement pauses the execution for 3 seconds, allowing time for the file upload to complete. It’s important to use page.sleep() instead of time.sleep() in asynchronous code. Using time.sleep() will block the entire event loop, preventing other asynchronous tasks from running, which can lead to unresponsive behavior in your application. By using await page.sleep(), the event loop remains active, allowing other tasks to be executed concurrently while waiting.

Step 4: Running the Browser

Finally, we will execute the MyBrowser class to start the browser and perform the file upload and download tasks concurrently.

if __name__ == "__main__":
    Browser.startBrowsers([MyBrowser()])

Summary

This example showcases how to create a custom browser class using selenium_swift for handling file uploads and downloads. By organizing the functionality into methods, you can easily maintain and extend the capabilities of your web scraping tasks.

from selenium_swift.browser import *

class MyBrowser(ChromeBrowser):
    def __init__(self) -> None:
        # Set the download directory
        self.path_download = r"c:\Users\progr\OneDrive\Bureau\test_download"
        option = ChromeOption('download.default_directory=' + self.path_download)
        super().__init__(option, ChromeService())
    async def tab_download(self):
        # Navigate to the download page
        page = await self.get('https://the-internet.herokuapp.com/download')
        link_list = await page.find_elements('css_selector', 'a')
        
        # Iterate through the links and click on those that end with '.pdf'
        for link in link_list:
            if link.text.endswith('.pdf'):
                link.click()
        
        # Wait for the download to complete (put this statment in the end of the tab)
        await page.wait_for_Download(self.path_download)
    async def tab_upload(self):
        # Navigate to the upload page
        page = await self.get('https://the-internet.herokuapp.com/upload')
        
        # Locate the file input element and upload a file
        input_file = await page.find_element('id', "file-upload")
        input_file.send_file(r'c:\Users\progr\Downloads\DATA_Data_Analysis_2_AR.pdf')
        
        # Optional: wait for a brief period to ensure the file is uploaded
        await page.sleep(3)

Example 5: Custom Page Handling in selenium_swift

This example demonstrates how to create custom page classes that extend the PageEvent class within the selenium_swift framework. This approach allows for modular and organized handling of web interactions, such as downloading and uploading files.

Overview

In this implementation, two separate pages are created:

PageDownload: This class is designed for downloading files from a specific webpage.
PageUpload: This class facilitates uploading files to a designated webpage.

You can create custom page classes to manage complex interactions, such as clicks, file uploads, mouse events, and other interactions.

Implementation

from selenium_swift.browser import *

# Define the PageDownload class to handle file downloads
class PageDownload(PageEvent):
    def __init__(self) -> None:
        super().__init__('https://the-internet.herokuapp.com/download')
    async def download_images(self):
        link_list = await self.find_elements('css_selector', 'a')
        for link in link_list:
            if link.text.endswith(('.png', '.jpg')):
                link.click()

    async def download_pdf(self):
        link_list = await self.find_elements('css_selector', 'a')
        for link in link_list:
            if link.text.endswith('.pdf'):
                link.click()

    async def download_text_files(self):
        link_list = await self.find_elements('css_selector', 'a')
        for link in link_list:
            if link.text.endswith('.txt'):
                link.click()

# Define the PageUpload class to handle file uploads
class PageUpload(PageEvent):
    def __init__(self) -> None:
        super().__init__('https://the-internet.herokuapp.com/upload')

    async def upload_image(self, image_path):
        input_file = await self.find_element('id', "file-upload")
        input_file.send_file(image_path)

    async def upload_pdf(self, pdf_path):
        input_file = await self.find_element('id', "file-upload")
        input_file.send_file(pdf_path)

    async def upload_text_file(self, text_file_path):
        input_file = await self.find_element('id', "file-upload")
        input_file.send_file(text_file_path)

# Define the MyBrowser1 class to manage download and upload actions
class MyBrowser1(ChromeBrowser):
    def __init__(self) -> None:
        self.path_download = r"c:\Users\progr\OneDrive\Bureau\test_download"
        option = ChromeOption('download.default_directory=' + self.path_download)
        super().__init__(option, ChromeService())

    async def tab_download(self):
        page_download = await PageDownload().open()
        await page_download.download_pdf()
        await page_download.download_images()
        await page_download.download_text_files()
        await page_download.wait_for_Download(self.path_download)

    async def tab_upload(self):
        page_upload = await PageUpload().open()
        await page_upload.upload_image(r"c:\Users\progr\Downloads\nature2.jpg")
        await page_upload.upload_pdf(r"c:\Users\progr\Downloads\DATA_Data_Analysis_2_AR.pdf")
        await page_upload.upload_text_file(r'd:\ascii.txt')
        await page_upload.sleep(3)

# Start the browser and run the download and upload tasks
if __name__ == "__main__":
    Browser.startBrowsers([MyBrowser1()])

Explanation

Custom Page Classes:
- PageDownload: This class encapsulates methods to download different file types. Each method fetches all links on the page and clicks on the ones that match the specified file extensions.
  - download_images(): Downloads image files with .png or .jpg extensions.
  - download_pdf(): Downloads files with a .pdf extension.
  - download_text_files(): Downloads files with a .txt extension.
- PageUpload: This class provides methods to upload files. Each method allows for the upload of a specific file type.
  - upload_image(image_path): Uploads an image file.
  - upload_pdf(pdf_path): Uploads a PDF file.
  - upload_text_file(text_file_path): Uploads a text file.
MyBrowser1 Class:
- This class extends ChromeBrowser and manages two separate tabs for downloading and uploading files. The methods prefixed with tab_ signal to the browser that they will open a new tab.
- tab_download(): Opens the download page and executes methods to download various file types, followed by waiting for the download to complete.
- tab_upload(): Opens the upload page and executes methods to upload specified files. The sleep method is called to pause execution for a brief period, allowing the upload to complete.

Conclusion

By extending the PageEvent class, you can create specialized page handling classes that streamline file download and upload processes, making your web scraping tasks more efficient and organized. This structure also enhances readability and maintainability of your code.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Oct 17, 2024

0.1.5

Oct 17, 2024

0.1.4

Oct 17, 2024

0.1.3

Oct 8, 2024

0.1.2

Oct 6, 2024

This version

0.1.1

Oct 6, 2024

0.1.0

Sep 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selenium_swift-0.1.1.tar.gz (34.8 kB view details)

Uploaded Oct 6, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

selenium_swift-0.1.1-py3-none-any.whl (30.6 kB view details)

Uploaded Oct 6, 2024 Python 3

File details

Details for the file selenium_swift-0.1.1.tar.gz.

File metadata

Download URL: selenium_swift-0.1.1.tar.gz
Upload date: Oct 6, 2024
Size: 34.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for selenium_swift-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`df2c4fc2993e63ff2d7edf28564666ef3babc244c6e49d9daaa90e61cc438050`
MD5	`d34b272ce7527d4064ad4ce114572ac7`
BLAKE2b-256	`a1b717e73621fcaa628177951f51387519df9cf25a70dbe6300c66b8b966b913`

See more details on using hashes here.

File details

Details for the file selenium_swift-0.1.1-py3-none-any.whl.

File metadata

Download URL: selenium_swift-0.1.1-py3-none-any.whl
Upload date: Oct 6, 2024
Size: 30.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for selenium_swift-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2df95a45c782c52b1aa33ce3794ed62c3f7e34717490aa6ee28a6e31da7f9320`
MD5	`0b54283341dd42876ca7485f82eeb18f`
BLAKE2b-256	`799d11df22213b780f67f40fded1ef3eccaa5223611830c7c6a3a2a78fb5df94`

See more details on using hashes here.

selenium-swift 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

selenium_swift

Key Features

Installation

Usage Example

Example 1:

Explanation

Key Concepts:

Example Code

Breakdown of the Code

Key Considerations

Example 2:

Explanation

Key Concepts:

Example Code

Breakdown of the Code

Example 3:

Example 4: Concurrent File Upload and Download

Step 1: Create the MyBrowser Class

Step 2: Implement the tab_download Method

Step 3: Implement the tab_upload Method

Step 4: Running the Browser

Summary

Example 5: Custom Page Handling in selenium_swift

Overview

Implementation

Explanation

Conclusion

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Step 2: Implement the `tab_download` Method

Step 3: Implement the `tab_upload` Method