PyWebScraping is a Python library for browser automation and web scraping. It supports Chrome, Firefox, Edge, and Yandex, providing a consistent API for managing browser sessions, options, and common actions like scrolling, element interaction, and JavaScript execution. It also facilitates remote webdriver control.
Project description
PyWebScraping simplifies interaction with web browsers for scraping and automation tasks. It currently supports Chrome, Firefox, Edge, and Yandex browsers, providing a consistent interface for managing browser sessions, handling options, and performing common actions.
Key Features:
Cross-Browser Support: Seamlessly work with Chrome, Firefox, Edge, and Yandex browsers using a unified API.
Remote WebDriver Control: Connect to and manage existing browser sessions remotely.
Headless Browsing: Execute tasks discreetly in the background without a visible browser window.
Proxy Support: Integrate proxies for managing network requests.
User Agent Spoofing: Customize the user agent string for various browser impersonations.
Window Management: Control window size, position, and manage multiple tabs/windows.
Simplified API: Perform common actions like scrolling, hovering, finding elements, and executing JavaScript.
Installation:
pip install PyWebScraping
API Reference:
BaseDriver: Provides fundamental classes like EmptyWebDriver, BrowserOptionsManager, BrowserStartArgs, and BrowserWebDriver for core browser management functionality.
ChromeDriver/EdgeDriver/FirefoxDriver/YandexDriver: Contains specific implementations for each browser, including options management, startup argument handling, and remote webdriver connection classes.
browsers_handler: Includes helper classes like WindowRect for managing window dimensions and get_installed_browsers/get_browser_version for retrieving system browser information.
Modules Overview:
EmptyWebDriver: A base class offering essential methods for interacting with a webdriver.
BrowserOptionsManager: Base class for managing browser-specific options. Subclassed for each browser type.
BrowserStartArgs: Base class for managing browser startup arguments. Subclassed for each browser.
BrowserWebDriver: Base class for managing the lifecycle of a webdriver instance. Subclassed for each browser.
Chrome(Remote)WebDriver, Edge(Remote)WebDriver, Firefox(Remote)WebDriver, Yandex(Remote)WebDriver: Concrete implementations for managing local and remote sessions for each browser.
This library aims to simplify browser automation in Python. Contributions and feedback are welcome!
Usage Examples:
Starting a Chrome Webdriver:
from PyWebScraping.webdrivers.ChromeDriver import ChromeWebDriver
from PyWebScraping.utilities import WindowRect
webdriver = ChromeWebDriver(webdriver_path="/path/to/chromedriver", window_rect=WindowRect(0, 0, 800, 600))
webdriver.start_webdriver(headless_mode=True)
webdriver.driver.get("https://www.example.com")
# ... perform actions ...
webdriver.close_webdriver()
Connecting to a Remote Chrome Instance:
from PyWebScraping.webdrivers.ChromeDriver import ChromeRemoteWebDriver
command_executor, session_id = # ... obtain from your running remote webdriver instance ...
remote_webdriver = ChromeRemoteWebDriver(command_executor, session_id)
remote_webdriver.create_driver()
# ...Interact with the remote browser...
remote_webdriver.close_webdriver()
Working with Edge (similar for Firefox and Yandex with their respective classes):
from PyWebScraping.webdrivers.EdgeDriver import EdgeWebDriver
webdriver = EdgeWebDriver(webdriver_path="/path/to/msedgedriver")
webdriver.start_webdriver()
# ... interact ...
webdriver.close_webdriver()
Future Notes
PyWebScraping is under active development. Planned future enhancements include support for additional browsers, advanced interaction features, and improved handling of dynamic web content. Contributions and suggestions for new features are welcome! Feel free to open issues or submit pull requests on the project’s repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pywebscraping-1.3.7.tar.gz
.
File metadata
- Download URL: pywebscraping-1.3.7.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 813e38aa81ece67546166a4b0f80af5e6747af7dc64cc07170968b84111cd07a |
|
MD5 | 6107aa8997b543eeb5282f7a49045aa4 |
|
BLAKE2b-256 | 6a90ae0b20a07e262494876f1bf7118f5be02543962946e02202803dae1557ab |
File details
Details for the file PyWebScraping-1.3.7-py3-none-any.whl
.
File metadata
- Download URL: PyWebScraping-1.3.7-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32cec1ed61a643646fa7845e8db9d9e5c62ca4fa05377f2ef5cab7d9bf22aebc |
|
MD5 | 59d704ef6b2b52ec444b3405cc187e69 |
|
BLAKE2b-256 | 19fb6ea926be3edb240af67fb9c64130e39eaa7409eb27d0f4bece5c3a02047e |