Skip to main content

NAICS code business domain classifier and domain utility kit

Project description

# usbusiness

The aim of the project ot to provide an open source business classifier using website information.

## Reasearch

Web Page Classification: Features and Algorithms (2009)

Automated Text Classification in the DMOZ Hierarchy (2009)

Topical Web-page classification of the DMOZ Dataset (2015)

## Industrys of Weakness

  1. Religious

  2. Oil and Gas

  3. Finance

  4. Large Companies

### Options

  1. Remove stop words (T/F)

  2. My words selection, None, google_10, google_100k

### TO DO

  1. Link depth pull option

  2. Data Set

  3. Training / Validation

### Components

  1. The data set

  2. The words

  3. The confidence

  4. Link depth

  5. The predictive model

### Ideas

  1. Stemmers

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usbusiness-0.2.1.tar.gz (437.5 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page