Skip to main content

PyWebScraping is a Python library for browser automation and web scraping. It supports Chrome, Firefox, Edge, and Yandex, providing a consistent API for managing browser sessions, options, and common actions like scrolling, element interaction, and JavaScript execution. It also facilitates remote webdriver control.

Project description

PyWebScraping simplifies interaction with web browsers for scraping and automation tasks. It currently supports Chrome, Firefox, Edge, and Yandex browsers, providing a consistent interface for managing browser sessions, handling options, and performing common actions.

Key Features:

  • Cross-Browser Support: Seamlessly work with Chrome, Firefox, Edge, and Yandex browsers using a unified API.

  • Remote WebDriver Control: Connect to and manage existing browser sessions remotely.

  • Headless Browsing: Execute tasks discreetly in the background without a visible browser window.

  • Proxy Support: Integrate proxies for managing network requests.

  • User Agent Spoofing: Customize the user agent string for various browser impersonations.

  • Window Management: Control window size, position, and manage multiple tabs/windows.

  • Simplified API: Perform common actions like scrolling, hovering, finding elements, and executing JavaScript.

Installation:

pip install PyWebScraping

API Reference:

  • BaseDriver: Provides fundamental classes like EmptyWebDriver, BrowserOptionsManager, BrowserStartArgs, and BrowserWebDriver for core browser management functionality.

  • ChromeDriver/EdgeDriver/FirefoxDriver/YandexDriver: Contains specific implementations for each browser, including options management, startup argument handling, and remote webdriver connection classes.

  • browsers_handler: Includes helper classes like WindowRect for managing window dimensions and get_installed_browsers/get_browser_version for retrieving system browser information.

Modules Overview:

  • EmptyWebDriver: A base class offering essential methods for interacting with a webdriver.

  • BrowserOptionsManager: Base class for managing browser-specific options. Subclassed for each browser type.

  • BrowserStartArgs: Base class for managing browser startup arguments. Subclassed for each browser.

  • BrowserWebDriver: Base class for managing the lifecycle of a webdriver instance. Subclassed for each browser.

  • Chrome(Remote)WebDriver, Edge(Remote)WebDriver, Firefox(Remote)WebDriver, Yandex(Remote)WebDriver: Concrete implementations for managing local and remote sessions for each browser.

This library aims to simplify browser automation in Python. Contributions and feedback are welcome!

Usage Examples:

Starting a Chrome Webdriver:

from PyWebScraping.webdrivers.ChromeDriver import ChromeWebDriver
from PyWebScraping.utilities import WindowRect

webdriver = ChromeWebDriver(webdriver_path="/path/to/chromedriver", window_rect=WindowRect(0, 0, 800, 600))
webdriver.start_webdriver(headless_mode=True)
webdriver.driver.get("https://www.example.com")
# ... perform actions ...
webdriver.close_webdriver()

Connecting to a Remote Chrome Instance:

from PyWebScraping.webdrivers.ChromeDriver import ChromeRemoteWebDriver

command_executor, session_id =  # ... obtain from your running remote webdriver instance ...
remote_webdriver = ChromeRemoteWebDriver(command_executor, session_id)
remote_webdriver.create_driver()
# ...Interact with the remote browser...
remote_webdriver.close_webdriver()

Working with Edge (similar for Firefox and Yandex with their respective classes):

from PyWebScraping.webdrivers.EdgeDriver import EdgeWebDriver

webdriver = EdgeWebDriver(webdriver_path="/path/to/msedgedriver")
webdriver.start_webdriver()
# ... interact ...
webdriver.close_webdriver()

Future Notes

PyWebScraping is under active development. Planned future enhancements include support for additional browsers, advanced interaction features, and improved handling of dynamic web content. Contributions and suggestions for new features are welcome! Feel free to open issues or submit pull requests on the project’s repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebscraping-1.3.7.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

PyWebScraping-1.3.7-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file pywebscraping-1.3.7.tar.gz.

File metadata

  • Download URL: pywebscraping-1.3.7.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for pywebscraping-1.3.7.tar.gz
Algorithm Hash digest
SHA256 813e38aa81ece67546166a4b0f80af5e6747af7dc64cc07170968b84111cd07a
MD5 6107aa8997b543eeb5282f7a49045aa4
BLAKE2b-256 6a90ae0b20a07e262494876f1bf7118f5be02543962946e02202803dae1557ab

See more details on using hashes here.

File details

Details for the file PyWebScraping-1.3.7-py3-none-any.whl.

File metadata

File hashes

Hashes for PyWebScraping-1.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 32cec1ed61a643646fa7845e8db9d9e5c62ca4fa05377f2ef5cab7d9bf22aebc
MD5 59d704ef6b2b52ec444b3405cc187e69
BLAKE2b-256 19fb6ea926be3edb240af67fb9c64130e39eaa7409eb27d0f4bece5c3a02047e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page