NAICS code business domain classifier and domain utility kit
Project description
# usbusiness
The aim of the project ot to provide an open source business classifier using website information.
## Reasearch
Web Page Classification: Features and Algorithms (2009) https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf
Automated Text Classification in the DMOZ Hierarchy (2009) http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf
Topical Web-page classification of the DMOZ Dataset (2015) https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf
## Industrys of Weakness
Religious
Oil and Gas
Finance
Large Companies
### Options
Remove stop words (T/F)
My words selection, None, google_10, google_100k
### TO DO
Link depth pull option
Data Set
Training / Validation
### Components
The data set
The words
The confidence
Link depth
The predictive model
### Ideas
Stemmers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file usbusiness-0.2.1.tar.gz
.
File metadata
- Download URL: usbusiness-0.2.1.tar.gz
- Upload date:
- Size: 437.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
22fb7e1a7bd2ec11ad7aaa62e43a7c1cad4971293adaf73620bce39206131add
|
|
MD5 |
24b06cfda4f542d9cf885d026f26df40
|
|
BLAKE2b-256 |
3ddfa907ecec1b72a69800cb7c0c20aaf18b2ad8e5cb98422fde329aa5b4219e
|