Assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles
Project description
Sockit is a self-contained toolkit for assigning probabilistic Standard Occupational Classification (SOC) codes to free-text job titles. It is developed by Research Improving People's Lives (RIPL).
Installation
Requires Python 3.8 or later.
To install from PyPI using pip:
pip install sockit
To install a development version from the current directory:
pip install -e .
Running
There is a single command line script included, sockit
, that processes existing CSV files containing free-text job titles in one of the columns:
sockit -h
usage: sockit [-h] [-v] [-q] [-d] -i INPUT [-o OUTPUT] [--record_id RECORD_ID] [--title TITLE] [--score SCORE SCORE]
Sockit: assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles https://github.com/ripl-org/sockit
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-q, --quiet suppress all logging messages except for errors
-d, --debug show all logging messages, including debugging output
-i INPUT, --input INPUT
input CSV file containing the record ID and title fields
-o OUTPUT, --output OUTPUT
output file (default: stdout) containing a JSON record per line: {'record_id': ..., 'title': ..., 'clean_title': ..., 'socs': [{'soc': ..., 'prob': ..., 'desc': ...}, ...]}
--record_id RECORD_ID
field name corresponding to the record ID [default: 1-based index]
--title TITLE field name corresponding to the title [default: 'title']
--score SCORE SCORE weight likely SOCs by matches to nodes and nouns [default: 1 2]
Alternatively, you can load the sockit
package in a python script and process titles one at a time with the search()
method:
import sockit
clean_title = sockit.clean(title)
result = sockit.search(clean_title)
License
Sockit is freely available for non-commercial use under the license provided in LICENSE.txt. Please contact connect@ripl.org to inquire about commercial use.
Contributors
- Marcelle Goggins
- Ethan Ho
- Nile Dixon
- Mark Howison
- Joe Long
- Karen Shen
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sockit-0.1.0.zip
.
File metadata
- Download URL: sockit-0.1.0.zip
- Upload date:
- Size: 778.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
870623419105057f3a02be27b3892d20d6fee705d88ea5b835fc53d1bbdf3c64
|
|
MD5 |
a81ff8e3cdbe0eb797d45c4ba93a525b
|
|
BLAKE2b-256 |
0428791c5edbd3922581c567433edc8c0bc99082de4ef31bbd324db18094a718
|
File details
Details for the file sockit-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: sockit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 774.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
ffe8a04dfd93e7be1f7730abed0531a6b4acde681714a376488842fdfa487e88
|
|
MD5 |
57ec04da929ec9267e68d21267d19137
|
|
BLAKE2b-256 |
8ea985e03f0fe019b1d713b2f76a135faa7fc56ff72fbc77704ef0c81d071aee
|