An easy to use web scrapping library via JS scripts
Project description
ScrapPyJS Class
The ScrapPyJS
class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.
Constructor
__init__(script=None, browser=None, show=False, debug=False, strict=False)
The constructor initializes a ScrapPyJS
object with the following parameters:
-
script
(optional): The JavaScript code to be executed by the web browser. -
browser
(optional): An existing instance of a Selenium WebDriver. If not provided, a new instance will be created using Chrome. -
show
(optional): Boolean value indicating whether to show the browser window. Default isFalse
. -
debug
(optional): Boolean value indicating whether to enable debug mode. Default isFalse
. -
strict
(optional): Boolean value indicating whether to enable strict mode. Default isFalse
.
Methods
ScrapPyJS.setup_browser()
This method sets up the web browser instance. It creates a new instance of a Chrome WebDriver with the specified options based on the constructor parameters.
ScrapPyJS.set_script(script)
This method sets the JavaScript code to be executed by the web browser.
script
: The JavaScript code to be executed.
ScrapPyJS.scrap(url, wait=False, wait_for=None, wait_target=None, wait_time=10)
This method performs web scraping on the specified URL.
-
url
: The URL to scrape. -
wait
(optional): Boolean value indicating whether to wait for an element to be present on the page before scraping. Default isFalse
. -
wait_for
(optional): The method to use for locating the element to wait for. Possible values are'class'
,'id'
,'name'
,'tag'
,'link'
,'part_link'
,'css'
, or'xp'
. Default isNone
. -
wait_target
(optional): The target value to locate the element to wait for. Default isNone
. -
wait_time
(optional): The maximum time (in seconds) to wait for the element to be present. Default is10
.
Returns the result of executing the JavaScript code on the web page.
ScrapPyJS.end()
This method terminates the web browser instance if it exists.
How to Use
-
Import the ScrapPyJS:
from ScrapPy import ScrapPy
-
Create an instance of the ScrapPyJS class:
scrappy = ScrapPyJS()
-
Set JS script as string to return a value from the website
scrappy.set_script("return 'ScrapPy scrapping!'")
-
Use the scrap method to scrape a webpage:
result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
-
Terminate the web browser instance when finished:
scrappy.end()
Please note that you will need to have the necessary Selenium
and WebDriver
dependencies installed to use this code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ScrapPyJS-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50d41d6ced8a744c730b574a0e45673f3c0dadf387b0861a779108fb190a71de |
|
MD5 | 470745462edd919d2b1974735612055a |
|
BLAKE2b-256 | 72ba4064cadaa72eac2a3ac00125d6e0c0f6a0eac282799b9ab20ad9aa7dcb66 |