Extract ESCO skills from texts such as job descriptions or CVs
Project description
ESCO Skill Extractor
This is a a tool that extract ESCO skills from texts such as job descriptions or CVs. It uses a transformer and compares its embedding using cosine similarity.
Installation
pip install esco-skill-extractor
Usage
from esco_skill_extractor import SkillExtractor
# `device` kwarg is optional and defaults to 'cpu', `cuda` or others can be used.
# `threshold` kwarg is optional and defaults to 0.4, it's the cosine similarity threshold.
skill_extractor = SkillExtractor()
ads = [
"We are looking for a software engineer with experience in Java and Python.",
"We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus."
# ...
]
print(skill_extractor.get_skills(ads))
# Output:
# [
# [
# "http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d"
# ],
# [
# "http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4",
# "http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce",
# ],
# ]
# ]
How it works
- It creates embeddings from esco skills found in the official ESCO website.
- It creates embeddings from the input text (one for each sentence).
- It compares the embeddings of the text with the embeddings of the ESCO skills using cosine similarity.
- It returns the most similar esco skill per sentence if its similarity passes a predefined threshold.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for esco-skill-extractor-0.1.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38bdd8ae18bc3b6279d4fb009f89fd6ad20e9b144c8336d2298258d9c2faed52 |
|
MD5 | 3103ce17c4b7b6c3d4054bc0d8715601 |
|
BLAKE2b-256 | 30ad154649d1e8c18ef63c6ebc2d90cc3bd9b6e5e741b7530c9edb3d22ab6413 |
Close
Hashes for esco_skill_extractor-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94fe0ca1078977210b982cdc3e4719868a780d3de36272607668e910f18db4e0 |
|
MD5 | e06648b9ab72022f808880487c049748 |
|
BLAKE2b-256 | 106c984d7b954c98bbf8a48c020c5d57cd30c244d96ff59dfc1f1d7c157a7f8d |