Skip to main content

NAICS code business domain classifier and domain utility kit

Project description

# usbusiness

The aim of the project ot to provide an open source business classifier using website information.

## Reasearch

Web Page Classification: Features and Algorithms (2009) https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf

Automated Text Classification in the DMOZ Hierarchy (2009) http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf

Topical Web-page classification of the DMOZ Dataset (2015) https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf

## Industrys of Weakness

  1. Religious
  2. Oil and Gas
  3. Finance
  4. Large Companies

### Options

  1. Remove stop words (T/F)
  2. My words selection, None, google_10, google_100k

### TO DO

  1. Link depth pull option
  2. Data Set
  3. Training / Validation

### Components

  1. The data set
  2. The words
  3. The confidence
  4. Link depth
  5. The predictive model

### Ideas

  1. Stemmers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for usbusiness, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size usbusiness-0.2.1.tar.gz (437.5 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page