Skip to main content

Leveraging Artificial Intelligence for Skills Extraction and Research

Project description

[!CAUTION]

LAiSER is currently in development mode, features could be experimental. Use with caution!

Leveraging ​Artificial ​Intelligence for ​Skill ​Extraction &​ Research (LAiSER)

Contents

LAiSER is a tool that helps learners, educators and employers share trusted and mutually intelligible information about skills​.

About

LAiSER is an innovative tool that harnesses the power of artificial intelligence to simplify the extraction and analysis of skills. It is designed for learners, educators, and employers who want to gain reliable insights into skill sets, ensuring that the information shared is both trusted and mutually intelligible across various sectors.

By leveraging state-of-the-art AI models, LAiSER automates the process of identifying and classifying skills from diverse data sources. This not only saves time but also enhances accuracy, making it easier for users to discover emerging trends and in-demand skills.

The tool emphasizes standardization and transparency, offering a common framework that bridges the communication gap between different stakeholders. With LAiSER, educators can better align their teaching methods with industry requirements, and employers can more effectively identify the competencies required for their teams. The result is a more efficient and strategic approach to skill development, benefiting the entire ecosystem.

Architecture

LAiSER uses a four-stage extraction and alignment pipeline:

  1. Extraction Input text is normalized by input type and passed through prompt construction and LLM inference to produce raw concept candidates.
  2. Parsing and deduplication Model output is parsed into structured concepts and filtered through exact and semantic deduplication.
  3. Taxonomy alignment Extracted concepts are matched against bundled taxonomy indexes using embedding-based similarity search and threshold filtering.
  4. Output normalization Alignment results are converted into a unified tabular schema, with optional edge generation for graph-style outputs.

Requirements

  • Python version >=3.8.
  • The package supports the current tested matrix through Python 3.13.
  • A GPU is recommended for heavy local model workflows, but API-backed extraction can run CPU-only.
  • Provider-specific environment variables may be required depending on backend:
    • GEMINI_API_KEY or GOOGLE_API_KEY
    • OPENAI_API_KEY

Setup and Installation

  • Install LAiSER from PyPI:

    pip install laiser
    
  • Install with GPU extras:

    pip install "laiser[gpu]"
    
  • Install development dependencies from source:

    pip install -e ".[dev]"
    

NOTE: Python 3.8 or later is required. Python 3.12 or 3.13 is recommended for current development and CI parity.

You can check if your machine has a GPU available with:

python -c "import torch; print(torch.cuda.is_available())"

Usage

LAiSER is used as a Python package. The recommended API is SkillExtractorRefactored.

Basic job description extraction

import os
import pandas as pd

from laiser.skill_extractor_refactored import SkillExtractorRefactored

data = pd.DataFrame(
    [
        {
            "Research ID": "job-001",
            "description": "Build production machine learning systems in Python.",
        }
    ]
)

extractor = SkillExtractorRefactored(
    model_id="gemini",
    api_key=os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY"),
    use_gpu=False,
)

results = extractor.extract_concepts(
    data=data,
    id_column="Research ID",
    text_columns=["description"],
    input_type="job_desc",
    concepts=["skills", "knowledge", "tasks"],
)

print(results.head())

Course syllabus extraction

import os
import pandas as pd

from laiser.skill_extractor_refactored import SkillExtractorRefactored

data = pd.DataFrame(
    [
        {
            "Research ID": "course-001",
            "description": "Introduction to data visualization and exploratory analysis.",
            "learning_outcomes": "Create dashboards, explain patterns in data, and evaluate charts.",
        }
    ]
)

extractor = SkillExtractorRefactored(
    model_id="gemini",
    api_key=os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY"),
    use_gpu=False,
)

results = extractor.extract_concepts(
    data=data,
    id_column="Research ID",
    text_columns=["description", "learning_outcomes"],
    input_type="course_syllabi",
    concepts=["skills"],
)

print(results.head())

Common runtime options

  • model_id Provider or model selector such as gemini or openai
  • api_key API key for hosted providers
  • use_gpu Enables GPU-backed initialization where supported
  • allowed_sources Filters alignment sources such as ["esco"], ["onet"], or ["osn"]
  • top_k Per-alignment-call cap for matched rows
  • return_edges Returns {nodes, edges} instead of only normalized rows
  • output_csv_path Writes CSV output only when explicitly provided

Additional examples are available in docs/examples.md.

Funding

Authors

Partners


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laiser-1.0.tar.gz (88.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

laiser-1.0-py3-none-any.whl (88.6 MB view details)

Uploaded Python 3

File details

Details for the file laiser-1.0.tar.gz.

File metadata

  • Download URL: laiser-1.0.tar.gz
  • Upload date:
  • Size: 88.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for laiser-1.0.tar.gz
Algorithm Hash digest
SHA256 3b42794d628fc32a7cd6e8f877ac6476af833699b86c2dc0fdb3a0c97aeb0d68
MD5 5ed2534e264d2417f32a9cd52242d048
BLAKE2b-256 5ad77a613d9b473706fe61a23aa5590377bac58075bf5202c6f83e010fa2c942

See more details on using hashes here.

File details

Details for the file laiser-1.0-py3-none-any.whl.

File metadata

  • Download URL: laiser-1.0-py3-none-any.whl
  • Upload date:
  • Size: 88.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for laiser-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3892957419a28a0a3171746557c9b684d15f600cc21fc72102cc9302eb18cb60
MD5 8f68b2f03ff372c7dd17e0c10fbb1f4a
BLAKE2b-256 77ff59efe9c33bd788e3a6e1ed88c811e56cca6a9eb473f04660f99012132026

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page