Skip to main content

Assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles

Project description

Sockit is a self-contained toolkit for assigning probabilistic Standard Occupational Classification (SOC) codes to free-text job titles. It is developed by Research Improving People's Lives (RIPL).

Installation

Requires Python 3.8 or later.

To install from PyPI using pip:

pip install sockit

To install a development version from the current directory:

pip install -e .

Running

There is a single command line script included, sockit, that processes existing CSV files containing free-text job titles in one of the columns:

sockit -h

usage: sockit [-h] [-v] [-q] [-d] -i INPUT [-o OUTPUT] [--record_id RECORD_ID] [--title TITLE] [--score SCORE SCORE]

Sockit: assign probabilistic Standard Occupational Classification (SOC) codes to free-text job titles https://github.com/ripl-org/sockit

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -q, --quiet           suppress all logging messages except for errors
  -d, --debug           show all logging messages, including debugging output
  -i INPUT, --input INPUT
                        input CSV file containing the record ID and title fields
  -o OUTPUT, --output OUTPUT
                        output file (default: stdout) containing a JSON record per line: {'record_id': ..., 'title': ..., 'clean_title': ..., 'socs': [{'soc': ..., 'prob': ..., 'desc': ...}, ...]}
  --record_id RECORD_ID
                        field name corresponding to the record ID [default: 1-based index]
  --title TITLE         field name corresponding to the title [default: 'title']
  --score SCORE SCORE   weight likely SOCs by matches to nodes and nouns [default: 1 2]

Alternatively, you can load the sockit package in a python script and process titles one at a time with the search() method:

import sockit
clean_title = sockit.clean(title)
result = sockit.search(clean_title)

License

Sockit is freely available for non-commercial use under the license provided in LICENSE.txt. Please contact connect@ripl.org to inquire about commercial use.

Contributors

  • Marcelle Goggins
  • Ethan Ho
  • Nile Dixon
  • Mark Howison
  • Joe Long
  • Karen Shen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sockit-0.1.0.zip (778.2 kB view details)

Uploaded Source

Built Distribution

sockit-0.1.0-py3-none-any.whl (774.1 kB view details)

Uploaded Python 3

File details

Details for the file sockit-0.1.0.zip.

File metadata

  • Download URL: sockit-0.1.0.zip
  • Upload date:
  • Size: 778.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for sockit-0.1.0.zip
Algorithm Hash digest
SHA256 870623419105057f3a02be27b3892d20d6fee705d88ea5b835fc53d1bbdf3c64
MD5 a81ff8e3cdbe0eb797d45c4ba93a525b
BLAKE2b-256 0428791c5edbd3922581c567433edc8c0bc99082de4ef31bbd324db18094a718

See more details on using hashes here.

File details

Details for the file sockit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sockit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 774.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for sockit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ffe8a04dfd93e7be1f7730abed0531a6b4acde681714a376488842fdfa487e88
MD5 57ec04da929ec9267e68d21267d19137
BLAKE2b-256 8ea985e03f0fe019b1d713b2f76a135faa7fc56ff72fbc77704ef0c81d071aee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page