NAICS code business domain classifier and domain utility kit
Project description
# usbusiness
The aim of the project ot to provide an open source business classifier using website information.
## Reasearch
Web Page Classification: Features and Algorithms (2009) https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf
Automated Text Classification in the DMOZ Hierarchy (2009) http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf
Topical Web-page classification of the DMOZ Dataset (2015) https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf
## Industrys of Weakness
Religious
Oil and Gas
Architechts
Gas Stations
### Options
Remove stop words (T/F)
My words selection, None, google_10, google_100k
### TO DO
Link depth pull option
Data Set
Training / Validation
### Components
The data set
The words
The confidence
Link depth
The predictive model
### Ideas
Stemmers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.