Leveraging Artificial Intelligence for Skills Extraction and Research
Project description
[!CAUTION]
LAiSER is currently in development mode, features could be experimental. Use with caution!
Leveraging Artificial Intelligence for Skill Extraction & Research (LAiSER)
Contents
LAiSER is a tool that helps learners, educators and employers share trusted and mutually intelligible information about skills.
About
LAiSER is an innovative tool that harnesses the power of artificial intelligence to simplify the extraction and analysis of skills. It is designed for learners, educators, and employers who want to gain reliable insights into skill sets, ensuring that the information shared is both trusted and mutually intelligible across various sectors.
By leveraging state-of-the-art AI models, LAiSER automates the process of identifying and classifying skills from diverse data sources. This not only saves time but also enhances accuracy, making it easier for users to discover emerging trends and in-demand skills.
The tool emphasizes standardization and transparency, offering a common framework that bridges the communication gap between different stakeholders. With LAiSER, educators can better align their teaching methods with industry requirements, and employers can more effectively identify the competencies required for their teams. The result is a more efficient and strategic approach to skill development, benefiting the entire ecosystem.
Architecture
LAiSER uses a four-stage extraction and alignment pipeline:
- Extraction Input text is normalized by input type and passed through prompt construction and LLM inference to produce raw concept candidates.
- Parsing and deduplication Model output is parsed into structured concepts and filtered through exact and semantic deduplication.
- Taxonomy alignment Extracted concepts are matched against bundled taxonomy indexes using embedding-based similarity search and threshold filtering.
- Output normalization Alignment results are converted into a unified tabular schema, with optional edge generation for graph-style outputs.
Requirements
- Python version
>=3.8. - The package supports the current tested matrix through Python
3.13. - A GPU is recommended for heavy local model workflows, but API-backed extraction can run CPU-only.
- Provider-specific environment variables may be required depending on backend:
GEMINI_API_KEYorGOOGLE_API_KEYOPENAI_API_KEY
Setup and Installation
-
Install LAiSER from PyPI:
pip install laiser
-
Install with GPU extras:
pip install "laiser[gpu]"
-
Install development dependencies from source:
pip install -e ".[dev]"
NOTE: Python 3.8 or later is required. Python 3.12 or 3.13 is recommended for current development and CI parity.
You can check if your machine has a GPU available with:
python -c "import torch; print(torch.cuda.is_available())"
Usage
LAiSER is used as a Python package. The recommended API is SkillExtractorRefactored.
Basic job description extraction
import os
import pandas as pd
from laiser.skill_extractor_refactored import SkillExtractorRefactored
data = pd.DataFrame(
[
{
"Research ID": "job-001",
"description": "Build production machine learning systems in Python.",
}
]
)
extractor = SkillExtractorRefactored(
model_id="gemini",
api_key=os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY"),
use_gpu=False,
)
results = extractor.extract_concepts(
data=data,
id_column="Research ID",
text_columns=["description"],
input_type="job_desc",
concepts=["skills", "knowledge", "tasks"],
)
print(results.head())
Course syllabus extraction
import os
import pandas as pd
from laiser.skill_extractor_refactored import SkillExtractorRefactored
data = pd.DataFrame(
[
{
"Research ID": "course-001",
"description": "Introduction to data visualization and exploratory analysis.",
"learning_outcomes": "Create dashboards, explain patterns in data, and evaluate charts.",
}
]
)
extractor = SkillExtractorRefactored(
model_id="gemini",
api_key=os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY"),
use_gpu=False,
)
results = extractor.extract_concepts(
data=data,
id_column="Research ID",
text_columns=["description", "learning_outcomes"],
input_type="course_syllabi",
concepts=["skills"],
)
print(results.head())
Common runtime options
model_idProvider or model selector such asgeminioropenaiapi_keyAPI key for hosted providersuse_gpuEnables GPU-backed initialization where supportedallowed_sourcesFilters alignment sources such as["esco"],["onet"], or["osn"]top_kPer-alignment-call cap for matched rowsreturn_edgesReturns{nodes, edges}instead of only normalized rowsoutput_csv_pathWrites CSV output only when explicitly provided
Additional examples are available in docs/examples.md.
Funding
Authors
Partners
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file laiser-1.0.tar.gz.
File metadata
- Download URL: laiser-1.0.tar.gz
- Upload date:
- Size: 88.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b42794d628fc32a7cd6e8f877ac6476af833699b86c2dc0fdb3a0c97aeb0d68
|
|
| MD5 |
5ed2534e264d2417f32a9cd52242d048
|
|
| BLAKE2b-256 |
5ad77a613d9b473706fe61a23aa5590377bac58075bf5202c6f83e010fa2c942
|
File details
Details for the file laiser-1.0-py3-none-any.whl.
File metadata
- Download URL: laiser-1.0-py3-none-any.whl
- Upload date:
- Size: 88.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3892957419a28a0a3171746557c9b684d15f600cc21fc72102cc9302eb18cb60
|
|
| MD5 |
8f68b2f03ff372c7dd17e0c10fbb1f4a
|
|
| BLAKE2b-256 |
77ff59efe9c33bd788e3a6e1ed88c811e56cca6a9eb473f04660f99012132026
|