Language-Agnostic Website Embedding and Classification
Project description
Homepage2Vec - Beta :construction:
Language-Agnostic Website Embedding and Classification
Getting started
Setup:
Step 1: install the library with pip.
pip install homepage2vec
Step 2: Install the Selenium Chrome web driver, and add the folder to the system $PATH variable.
Please note that you need a local copy of Chrome browser (See Getting started).
Usage:
from homepage2vec.model import WebsiteClassifier
model = WebsiteClassifier()
webpages = model.embed_and_predict(['www.nsf.gov'])
print(webpages[0].scores)
{'Arts': 0.018672721460461617, 'Business': 0.01062296237796545,
'Computers': 0.017558472231030464, 'Games': 1.1537405953276902e-05,
'Health': 0.021613001823425293, 'Home': 1.8367260054219514e-05,
'Kids_and_Teens': 0.1226280927658081, 'News': 3.7846388295292854e-05,
'Recreation': 0.015628756955266, 'Reference': 0.7092769145965576,
'Science': 0.9873504042625427, 'Shopping': 0.00010123076208401471,
'Society': 0.26334095001220703, 'Sports': 0.0005139540298841894}
Output format:
- url: Website url
- is_valid: True if the request is successful
- features: Complete feature vector
- embedding: Embedding vector representing the website
- scores: Prediction probabilities
Customization
model = WebsiteClassifier(cpu_threads_count=24, dataloader_workers=4)
Work in progress...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
homepage2vec-0.0.2b0.tar.gz
(9.5 kB
view hashes)
Built Distribution
Close
Hashes for homepage2vec-0.0.2b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f35d946dc7a1a6461b2127fc58e1a47f09d23e0a1a99a5252896b0e9b2d5b093 |
|
MD5 | 1126249b3f0eac036ed5b9de46f53a4d |
|
BLAKE2b-256 | e6918a57d724cc5de45b693b38c3f2e44d17a12a27f2f6c91d7e925a84499097 |