An easy to use web scrapping library via JS scripts
Project description
ScrapPyJS
The ScrapPyJS
class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.
Installing
pip install ScrapPyJS
How to Use
Including and Initiating
from ScrapPyJS import ScrapPyJS
# initiate ScrapPyJS
scrappy = ScrapPyJS()
# set js script
JS_SCRIPT = "return 'ScrapPy scrapping!'"
scrappy.set_script(JS_SCRIPT)
# rest of the code goes here...
# close ScrapPyJS
scrappy.end()
Simple way
-
Use the
scrap
method to scrape a webpage:result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
Loop through list of URLs
-
Set up a list of target URLs
URLS = [ 'https://url1.com/', 'https://url2.com/homepage/', 'https://url2.com/about', ]
-
Use the
loop_through
method to scrape through the target webpages webpage:# The result value will be a list if save mode is on, else a JSON string result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
Save results to a file
Activate save mode
-
Via toggle:
scrappy.toggle_save_mode()
Here, the save mode which is set to
False
by Default is toggled toTrue
. So the save file informations are default. -
Via
set_save_info
method:scrappy.set_save_info(save=True)
Here, we directly set save mode to
True
leaving other infos to default.
Configure save mode
-
Via
set_save_info
method:FILE_NAME = "output" FILE_FORMAT = "json" SAVE_LOCATION = "path/to/file/" scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)
Please note that you will need to have the necessary Selenium
and WebDriver
dependencies installed to use this code.
Documentation
The necessary informations on the ScrapPyJS class is available in .\CLASS_STRUCTURE.md
License
This code has been licensed under GNU AGPLv3
open source copyleft license.
Author
NAME: Hind Sagar Biswas
Website: coderaptors.epizy.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ScrapPyJS-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4770bd9985be81327fd3092385f9cb6df2762125eb2cb6f6115580d4a35e18dd |
|
MD5 | 2c5e1469b30abac80263b97c9f0f8776 |
|
BLAKE2b-256 | 3b95d32c12e9ef88d2baa5a11e8359eaa3ea4b8cce132e1fe21364e3e88126fc |