Skip to main content

NAICS code business domain classifier and domain utility kit

Project description

# usbusiness

The aim of the project ot to provide an open source business classifier using website information.

## Reasearch

Web Page Classification: Features and Algorithms (2009) https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf

Automated Text Classification in the DMOZ Hierarchy (2009) http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf

Topical Web-page classification of the DMOZ Dataset (2015) https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf

## Industrys of Weakness

  1. Religious
  2. Oil and Gas
  3. Finance
  4. Large Companies

### Options

  1. Remove stop words (T/F)
  2. My words selection, None, google_10, google_100k

### TO DO

  1. Link depth pull option
  2. Data Set
  3. Training / Validation

### Components

  1. The data set
  2. The words
  3. The confidence
  4. Link depth
  5. The predictive model

### Ideas

  1. Stemmers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for usbusiness, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size usbusiness-0.2.1.tar.gz (437.5 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page