Skip to main content

NAICS code business domain classifier and domain utility kit

Project description

# usbusiness

The aim of the project ot to provide an open source business classifier using website information.

## Reasearch

Web Page Classification: Features and Algorithms (2009) https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf

Automated Text Classification in the DMOZ Hierarchy (2009) http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf

Topical Web-page classification of the DMOZ Dataset (2015) https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf

## Industrys of Weakness

  1. Religious
  2. Oil and Gas
  3. Finance
  4. Large Companies

### Options

  1. Remove stop words (T/F)
  2. My words selection, None, google_10, google_100k

### TO DO

  1. Link depth pull option
  2. Data Set
  3. Training / Validation

### Components

  1. The data set
  2. The words
  3. The confidence
  4. Link depth
  5. The predictive model

### Ideas

  1. Stemmers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
usbusiness-0.2.1.tar.gz (437.5 kB) Copy SHA256 hash SHA256 Source None Jul 20, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page