An easy to use web scrapping library via JS scripts
Project description
ScrapPyJS
The ScrapPyJS class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.
Installing
pip install ScrapPyJS
How to Use
Including and Initiating
from ScrapPyJS import ScrapPyJS
# initiate ScrapPyJS
scrappy = ScrapPyJS()
# set js script
JS_SCRIPT = "return 'ScrapPy scrapping!'"
scrappy.set_script(JS_SCRIPT)
# rest of the code goes here...
# close ScrapPyJS
scrappy.end()
Simple way
-
Use the
scrapmethod to scrape a webpage:result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
Loop through list of URLs
-
Set up a list of target URLs
URLS = [ 'https://url1.com/', 'https://url2.com/homepage/', 'https://url2.com/about', ]
-
Use the
loop_throughmethod to scrape through the target webpages webpage:# The result value will be a list if save mode is on, else a JSON string result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
Save results to a file
Activate save mode
-
Via toggle:
scrappy.toggle_save_mode()
Here, the save mode which is set to
Falseby Default is toggled toTrue. So the save file informations are default. -
Via
set_save_infomethod:scrappy.set_save_info(save=True)
Here, we directly set save mode to
Trueleaving other infos to default.
Configure save mode
-
Via
set_save_infomethod:FILE_NAME = "output" FILE_FORMAT = "json" SAVE_LOCATION = "path/to/file/" scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)
Please note that you will need to have the necessary Selenium and WebDriver dependencies installed to use this code.
Documentation
The necessary informations on the ScrapPyJS class is available in .\CLASS_STRUCTURE.md
License
This code has been licensed under GNU AGPLv3 open source copyleft license.
Author
NAME: Hind Sagar Biswas
Website: coderaptors.epizy.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ScrapPyJS-1.1.0.tar.gz.
File metadata
- Download URL: ScrapPyJS-1.1.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8fa767d2771b0b406b60b81571a304a4b179bf3347c40779f3de1a32f6f110f
|
|
| MD5 |
ddf4d6caffa9559f08e1886f8040c109
|
|
| BLAKE2b-256 |
70f43d84e2cf6aa25ce88e7ee6f7756801aa2f4d75a5bc9afffc4127eefaf62f
|
File details
Details for the file ScrapPyJS-1.1.0-py3-none-any.whl.
File metadata
- Download URL: ScrapPyJS-1.1.0-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4770bd9985be81327fd3092385f9cb6df2762125eb2cb6f6115580d4a35e18dd
|
|
| MD5 |
2c5e1469b30abac80263b97c9f0f8776
|
|
| BLAKE2b-256 |
3b95d32c12e9ef88d2baa5a11e8359eaa3ea4b8cce132e1fe21364e3e88126fc
|