Tool for parsing URL webpage into JSON + RDF.
Project description
URL Scrub
Tool for parsing URL webpage into JSON + RDF.
Setup
Dependencies
- Python:
3.10
geckodriver
orchromedriver
Installation Process
-
Install
urlscrub
withpip
python3.10 -m pip install urlscrub
-
Install
geckodriver
-
Download Firefox and install.
-
Linux (Ubuntu):
sudo apt-get install firefox
-
-
Unzip
geckodriver
/geckodriver.exe
file into a preferred directory. -
Append the directory containing
geckodriver
to yourPATH
variable. (Guide)
-
-
Install
chromedriver
-
Download Google Chrome and install.
-
Find the version of Google Chrome you have installed.
-
Download
chromedriver.zip
with the most corresponding version number.- Exact version number not required (Ex: chromedriver
102.0.5005.61
w/ Google Chrome102.0.5005.115
)
- Exact version number not required (Ex: chromedriver
-
Unzip
chromedriver
/chromedriver.exe
file into a preferred directory. -
Append the directory containing
chromedriver
to yourPATH
variable. (Guide)
-
Command Line Usage
-
Command:
urlscrub --skip-cookies --driver "chrome" -l "https://www.amazon.com/All-new-Kindle-Oasis-now-with-adjustable-warm-light/dp/B07GRSK3HC"
-
Response:
{ "results": [ { "type": "product", "productTitle": "Kindle Oasis \u2013 With adjustable warm light", "availability": "In Stock.", "rating": "19,734 ratings", "imageURL": "https://m.media-amazon.com/images/I/614TlIaYBvL._AC_SX679_.jpg" } ] }
Guides
-
Appending directories to your
PATH
environment variable.- Windows Guide
- Linux:
-
Append path to your
.bashrc
/.zshrc
export PATH="<geckodriver_dir>/:$PATH"
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file urlscrub-0.1.0.tar.gz
.
File metadata
- Download URL: urlscrub-0.1.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb273932060d70d725fedbd916d55bae66ae5685741ba1911f4b44aab0f1c61a |
|
MD5 | c75b63003a86afa4258b614b3acf16ce |
|
BLAKE2b-256 | a211f662281213d63de4926d03877a45ecda2813ff14e7e6df57b2723211bbf2 |