Skip to main content

An NLP package to extract skills from job adverts.

Project description

Skills Extractor

Welcome to Nesta's Skills Extractor Library

Welcome to the documentation of Nesta's skills extractor library.

This page contains information on how to install and use Nesta's skills extraction library. The skills library allows you to extract skills phrases from job advertisement texts and maps them onto a skills taxonomy of your choice.

We currently support three different taxonomies to map onto: the European Commission’s European Skills, Competences, and Occupations (ESCO), Lightcast’s Open Skills and a “toy” taxonomy developed internally for the purpose of testing.

If you'd like to learn more about the models used in the library, please refer to the model card page.

You may also want to read more about the wider project by reading:

  1. Our Introduction blog
  2. Our interactive analysis blog

Installation

To install as a package:

pip install ojd-daps-skills

🐍 NOTE: If you are using a conda environment you may need to do conda install scipy before pip installing this library.

NOTE: The first time you import SkillsExtractor in python it will take some time (around a minute) to load.

To extract skills from a job advert:

from ojd_daps_skills.extract_skills.extract_skills import SkillsExtractor

sm = SkillsExtractor(taxonomy_name="toy") # Can also use "esco" or "lightcast" here

job_ads = [
    "The job involves communication skills and maths skills",
    "The job involves Excel skills. You will also need good presentation skills",
    "You will need experience in the IT sector.",
]
job_ad_with_skills = sm(job_ads)

To access the extracted and mapped skills for each inputted job advert:

for job_ad_with_skills_doc in job_ad_with_skills:
  print(f"Job advert: {job_ad_with_skills_doc}")
  # print raw ents (i.e. multiskills are not split, also include 'BENEFIT' and 'EXPERIENCE' spans)
  print(f"Entities found: {[(ent.text, ent.label_) for ent in job_ad_with_skills_doc.ents]}")
  # print SKILL spans (where SKILL spans are predicted as multiskills, split them)
  print(f"Skill spans: {job_ad_with_skills_doc._.skill_spans}")
  # print mapped skills to the "toy" taxonomy
  print(f"Skills mapped: {job_ad_with_skills_doc._.mapped_skills}")
  print("\n")

Which returns:

Job advert: The job involves communication skills and maths skills
Entities found: [('communication skills', 'SKILL'), ('maths skills', 'SKILL')]
Skill spans: [communication skills, maths skills]
Skills mapped: [{'ojo_skill': 'communication skills', 'ojo_skill_id': 3144285826919113, 'match_skill': 'communication, collaboration and creativity', 'match_score': 0.75, 'match_type': 'most_common_level_1', 'match_id': 'S1'}, {'ojo_skill': 'maths skills', 'ojo_skill_id': 1654958883999821, 'match_skill': 'working with computers', 'match_score': 0.6666666666666666, 'match_type': 'most_common_level_1', 'match_id': 'S5'}]


Job advert: The job involves Excel skills. You will also need good presentation skills
Entities found: [('Excel', 'SKILL'), ('presentation skills', 'SKILL')]
Skill spans: [Excel, presentation skills]
Skills mapped: [{'ojo_skill': 'Excel', 'ojo_skill_id': 2576630861021310, 'match_skill': 'use spreadsheets software', 'match_score': 0.7379249334335327, 'match_type': 'skill', 'match_id': 'abcd'}, {'ojo_skill': 'presentation skills', 'ojo_skill_id': 1846141317334203, 'match_skill': 'communication, collaboration and creativity', 'match_score': 0.5, 'match_type': 'most_common_level_1', 'match_id': 'S1'}]


Job advert: You will need experience in the IT sector.
Entities found: [('experience in the IT sector', 'EXPERIENCE')]
Skill spans: []
Skills mapped: []

Development

pipx install poetry
poetry shell
poetry install

To run tests:

poetry run pytest tests/

Contributor guidelines

The technical and working style guidelines can be found here.

If contributing, changes will need to be pushed to a new branch in order for our code checks to be triggered.


This project was made possible via funding from the Economic Statistics Centre of Excellence

Project template is based on Nesta's data science project template (Read the docs here).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ojd_daps_skills-3.0.0.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ojd_daps_skills-3.0.0-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file ojd_daps_skills-3.0.0.tar.gz.

File metadata

  • Download URL: ojd_daps_skills-3.0.0.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.13 Darwin/24.5.0

File hashes

Hashes for ojd_daps_skills-3.0.0.tar.gz
Algorithm Hash digest
SHA256 5f41add2bc9b9845a09b82e3bef879d8ba9b45872ec2dcb00bd27fef67385a31
MD5 40b57eba56054e42c16a725ef7db0770
BLAKE2b-256 a09b094bcc9fdefd4d3928daf08f3afa71c3876d69796baf33891c3bc7365193

See more details on using hashes here.

File details

Details for the file ojd_daps_skills-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: ojd_daps_skills-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.13 Darwin/24.5.0

File hashes

Hashes for ojd_daps_skills-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3ee8d2bfcc165941cdac39c1cebecd697a1957ae165a130c118e9e5a9abdb9b
MD5 928330c82f2c52b4d2019c6163eeb302
BLAKE2b-256 807218831fd06ab1521e8ffa0cd14450a528113bf81e6d7a6915552e312b3a23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page