Skip to main content

NAICS code business domain classifier and domain utility kit

Project description

# usbusiness

The aim of the project ot to provide an open source business classifier using website information.

## Reasearch

Web Page Classification: Features and Algorithms (2009) https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf

Automated Text Classification in the DMOZ Hierarchy (2009) http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf

Topical Web-page classification of the DMOZ Dataset (2015) https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf

## Industrys of Weakness

  1. Religious

  2. Oil and Gas

  3. Finance

  4. Large Companies

### Options

  1. Remove stop words (T/F)

  2. My words selection, None, google_10, google_100k

### TO DO

  1. Link depth pull option

  2. Data Set

  3. Training / Validation

### Components

  1. The data set

  2. The words

  3. The confidence

  4. Link depth

  5. The predictive model

### Ideas

  1. Stemmers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usbusiness-0.2.1.tar.gz (437.5 kB view details)

Uploaded Source

File details

Details for the file usbusiness-0.2.1.tar.gz.

File metadata

  • Download URL: usbusiness-0.2.1.tar.gz
  • Upload date:
  • Size: 437.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for usbusiness-0.2.1.tar.gz
Algorithm Hash digest
SHA256 22fb7e1a7bd2ec11ad7aaa62e43a7c1cad4971293adaf73620bce39206131add
MD5 24b06cfda4f542d9cf885d026f26df40
BLAKE2b-256 3ddfa907ecec1b72a69800cb7c0c20aaf18b2ad8e5cb98422fde329aa5b4219e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page