Skip to main content

Extract ESCO skills from texts such as job descriptions or CVs

Project description

ESCO Skill Extractor

This is a a tool that extract ESCO skills and ISCO occupations from texts such as job descriptions or CVs. It uses a transformer and compares its embedding using cosine similarity.

Installation

pip install esco-skill-extractor

Installation via Docker

docker run -it -p 5000:5000 konstantinospetrakis/esco-skill-extractor

Usage

Via python

from esco_skill_extractor import SkillExtractor

# Don't be scared, the 1st time will take longer to download the model and create the embeddings.
skill_extractor = SkillExtractor()

ads = [
    "We are looking for a software engineer with experience in Java and Python.",
    "We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus."
    # ...
]

print(skill_extractor.get_skills(ads))
# [
#     [
#         "http://data.europa.eu/esco/skill/19a8293b-8e95-4de3-983f-77484079c389",
#         "http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d",
#     ],
#     [
#         "http://data.europa.eu/esco/skill/11430d93-c835-48ed-8e70-285fa69c9ae6",
#         "http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce",
#         "http://data.europa.eu/esco/skill/ce8ae6ca-61d8-4174-b457-641de96cbff4",
#         "http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4",
#     ],
# ]
print(skill_extractor.get_occupations(ads))
# [
#     [
#         "http://data.europa.eu/esco/occupation/10469d70-78a3-4650-9e29-d04de13c62c1",
#         "http://data.europa.eu/esco/occupation/1c5a896a-e010-4217-a29a-c44db26e25da",
#         "http://data.europa.eu/esco/occupation/4874fa37-0cd1-4a68-aed8-a838851f242d",
#         "http://data.europa.eu/esco/occupation/579254cf-6d69-4889-9000-9c79dc568644",
#         "http://data.europa.eu/esco/occupation/57af9090-55b4-4911-b2d0-86db01c00b02",
#         "http://data.europa.eu/esco/occupation/f2b15a0e-e65a-438a-affb-29b9d50b77d1",
#         "http://data.europa.eu/esco/isco/C2512",
#         "http://data.europa.eu/esco/isco/C2514",
#     ],
#     [
#         "http://data.europa.eu/esco/occupation/2fb96c6c-8d0b-4ef0-b1ee-3e493305e4eb",
#         "http://data.europa.eu/esco/occupation/349ee6f6-c295-4c38-9b98-48765b55280e",
#         "http://data.europa.eu/esco/occupation/781a6350-e686-45b9-b075-e4c8d5a05ff7",
#         "http://data.europa.eu/esco/occupation/93b11f0f-69af-4ece-b9da-f29aab7d38d3",
#         "http://data.europa.eu/esco/occupation/bb609566-3ab6-44dd-8f48-cf0b15b96827",
#         "http://data.europa.eu/esco/occupation/cc867bee-ab5c-427f-9244-f7a204d9574b",
#     ],
# ]

Via GUI

# Visit the URL printed in the console.
# run python -m esco_skill_extractor --help for more options.
python -m esco_skill_extractor

Via API

# Visit the URL printed in the console.
# run python -m esco_skill_extractor --help for more options.
python -m esco_skill_extractor
async function getSkills() {
  const texts = [
    "We are looking for a software engineer with experience in Java and Python.",
    "We are looking for a devops engineer. Containerization tools such as Docker is a must. AWS is a plus.",
    // ...
  ];

  // Default host is localhost, and default port is 8000. Check CLI options for more.
  const response = await fetch("http://localhost:8000/extract-skills", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify(texts),
  });

  const skills = await response.json();
  console.log(skills);
  //  [
  //    [
  //      "http://data.europa.eu/esco/skill/19a8293b-8e95-4de3-983f-77484079c389",
  //      "http://data.europa.eu/esco/skill/ccd0a1d9-afda-43d9-b901-96344886e14d",
  //    ],
  //    [
  //      "http://data.europa.eu/esco/skill/11430d93-c835-48ed-8e70-285fa69c9ae6",
  //      "http://data.europa.eu/esco/skill/ae4f0cc6-e0b9-47f5-bdca-2fc2e6316dce",
  //      "http://data.europa.eu/esco/skill/ce8ae6ca-61d8-4174-b457-641de96cbff4",
  //      "http://data.europa.eu/esco/skill/f0de4973-0a70-4644-8fd4-3a97080476f4",
  //    ],
  //  ]
  const occupations = await fetch("http://localhost:8000/extract-occupations", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify(texts),
  });
}

Possible keyword arguments for SkillExtractor

Keyword Argument Description Default
skill_threshold Skills surpassing this cosine similarity threshold are considered a match. 0.6
occupation_threshold Occupations surpassing this cosine similarity threshold are considered a match. 0.55
device The device where the copulations will take place. AKA torch device. "cuda" if available else "cpu"
model The name of the model to use for embeddings. See available models. "all-MiniLM-L6-v2"

Notes

If you change a model, don't forget to call the remove_embeddings static method to remove the old embeddings from disk.

SkillExtractor.remove_embeddings()
skill_extractor = SkillExtractor(model="new-model-name")

How it works

  1. Generates embeddings for ESCO skills and ISCO occupations.
  2. Creates embeddings for the extracted skills.
  3. Compares the embeddings of ESCO skills or ISCO occupations with those of the extracted skills using cosine similarity and matches them if the similarity is above a certain threshold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esco_skill_extractor-0.1.18.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

esco_skill_extractor-0.1.18-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file esco_skill_extractor-0.1.18.tar.gz.

File metadata

  • Download URL: esco_skill_extractor-0.1.18.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for esco_skill_extractor-0.1.18.tar.gz
Algorithm Hash digest
SHA256 fad5b1cd74ba0dde659e96db5ac26cf94cd7efd29fc870595cd05ce45ed146b6
MD5 90a39cd0860fc230ca039fc3f7812e6e
BLAKE2b-256 821025ec35798364d1becac521eab0daedc0d804d70c80e42d1ea180614b6f37

See more details on using hashes here.

File details

Details for the file esco_skill_extractor-0.1.18-py3-none-any.whl.

File metadata

File hashes

Hashes for esco_skill_extractor-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 ec6e5daa7d3247634a7950faf33ad9d1ca47fa2d7a09b95677f25a6f91f77af0
MD5 57bd24b1a1d718ef6705c0238d2212a5
BLAKE2b-256 1d2fca8d75798dfeea890f7dc42cd754d58f63c6c1c80e4bee706edc5ba7eb7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page