Skip to main content

Assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles

Project description

Sockit is a self-contained toolkit for assigning probabilistic Standard Occupational Classification (SOC) codes to free-text job titles. It is developed by Research Improving People's Lives (RIPL).

Installation

Requires Python 3.8 or later.

To install from PyPI using pip:

pip install sockit

To install a development version from the current directory:

pip install -e .

Running

There is a single command line script included, sockit, that processes existing CSV files containing free-text job titles in one of the columns:

sockit -h

usage: sockit [-h] [-v] [-q] [-d] -i INPUT [-o OUTPUT] [--record_id RECORD_ID] [--title TITLE]

Sockit: assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles https://github.com/ripl-org/sockit

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -q, --quiet           suppress all logging messages except for errors
  -d, --debug           show all logging messages, including debugging output
  -i INPUT, --input INPUT
                        input CSV file containing the record ID and title fields
  -o OUTPUT, --output OUTPUT
                        output file (default: stdout) containing a JSON record per line: {'record_id': ..., 'title': ..., 'clean_title': ..., 'socs': [{'soc': ..., 'prob': ..., 'desc': ...}, ...]}
  --record_id RECORD_ID
                        field name corresponding to the record ID [default: 1-based index]
  --title TITLE         field name corresponding to the title [default: 'title']

Alternatively, you can load the sockit package in a python script and process titles one at a time with the search() method:

import sockit
clean_title = sockit.clean(title)
result = sockit.search(clean_title)

License

Sockit is freely available for non-commercial use under the license provided in LICENSE.txt. Please contact connect@ripl.org to inquire about commercial use.

Contributors

  • Marcelle Goggins
  • Ethan Ho
  • Nile Dixon
  • Mark Howison
  • Joe Long
  • Karen Shen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sockit-0.0.2.zip (768.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page