No project description provided
Project description
Automatically Assigning Industry Classifications to Company Descriptions – An Entailment Based Approach
Description
naicskit is a Python package which assigns industry classification codes to descriptions of companies. This model leverages the Huggingface library and an entailment-based approach for assigning taxonomy codes with the hopes that this will make the model robust with unseen taxonomies. Based on initial results, the model has an 87% accuracy on taxonomies it was trained on and 81% accuracy on unseen taxonomies.
How To Use naicskit
from naicskit.coder import IndustryCoder
description = '...'
coder = IndustryCoder('naics.2022','2')
results = coder.code_records(description)
Supported Taxonomies
- International Standard of Industrial Classification Rev 5.0 (ISIC 2024)
isic.2024
- International Standard of Industrial Classification Rev 4.0 (ISIC 2006)
isic.2006
- International Standard of Industrial Classification Rev 3.1 (ISIC 2002)
isic.2002
- International Standard of Industrial Classification Rev 3.0 (ISIC 1989)
isic.1989
- International Standard of Industrial Classification Rev 2.0 (ISIC 1968)
isic.1968
- North American Industry Classification System 2022 (NAICS 2022)
naics.2022
- North American Industry Classification System 2017 (NAICS 2017)
naics.2017
- North American Industry Classification System 2012 (NAICS 2012)
naics.2012
- North American Industry Classification System 2007 (NAICS 2007)
naics.2007
- Standard Industrial Classification (SIC 1987)
sic.1987
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
naicskit-0.1.0.tar.gz
(203.6 kB
view hashes)
Built Distribution
naicskit-0.1.0-py3-none-any.whl
(213.2 kB
view hashes)