Skip to main content

OntoLearner: A Modular Python Library for Ontology Learning with LLMs.

Project description

OntoLearner Logo

OntoLearner: A Modular Python Library for Ontology Learning with LLMs

PyPI version PyPI Downloads License: MIT Hugging Face Collection pre-commit Documentation Status Maintenance DOI


OntoLearner is a modular and extensible Python library for ontology learning powered by Large Language Models (LLMs). It provides a unified framework covering the full workflow — from loading and modularizing ontologies to training, predicting, and evaluating learner models across multiple ontology learning tasks.

The framework is built around three core components:

  • 🧩 Ontologizers — load, parse, and modularize ontologies from 150+ ready-to-use sources across 20+ domains.
  • 📋 Learning Tasks — support for Term Typing, Taxonomy Discovery, Non-Taxonomic Relation Extraction, and Text2Onto.
  • 🤖 Learner Models — plug-and-play LLM, Retriever, and RAG-based learners with a consistent fit → predict → evaluate interface.

🧪 Installation

OntoLearner is available on PyPI and can be installed with pip:

pip install ontolearner

Verify the installation:

import ontolearner

print(ontolearner.__version__)

For additional installation options (e.g., from source, with optional dependencies), see the Installation Guide.


🔗 Essential Resources

Resource Description
📚 Documentation Full documentation website.
🤗 Datasets on Hugging Face Curated, machine-readable ontology datasets.
🚀 Quickstart Get started in minutes.
🕸️ Learning Tasks Term Typing, Taxonomy Discovery, Relation Extraction, and Text2Onto.
🧠 Learner Models LLM, Retriever, and RAG-based learner models.
📖 Ontologies Documentation Browse 150+ benchmark ontologies across 20+ domains.
🧩 Ontologizer Guide How to modularize and preprocess ontologies.
📊 Metrics Dashboard Explore benchmark ontology metrics and complexity scores.

✨ Key Features

  • 150+ Ontologizers across 20+ domains (biology, medicine, agriculture, chemistry, law, finance, and more).
  • Multiple learning tasks: Term Typing, Taxonomy Discovery, Non-Taxonomic Relation Extraction, and Text2Onto.
  • Three learner paradigms: LLM-based, Retriever-based, and Retrieval-Augmented Generation (RAG).
  • Hugging Face integration: auto-download ontologies and models directly from the Hub.
  • Unified API: consistent fit → predict → evaluate interface across all learners.
  • LearnerPipeline: end-to-end pipeline in a single call.
  • Extensible: easily plug in custom ontologies, learners, or retrievers.
  • Text2Onto generation: synthetic document generation now uses a direct transformers backend with ontology-aware context enrichment.

🚀 Quick Tour

Loading an Ontology

Load any of the 150+ built-in ontologies and extract task datasets in just a few lines:

from ontolearner import Wine

# Initialize an ontologizer
ontology = Wine()

# Auto-download from Hugging Face and load
ontology.load()

# Extract learning task datasets
data = ontology.extract()

# Inspect ontology metadata
print(ontology)

Explore 150+ ready-to-use ontologies or learn how to work with ontologizers.


Retriever-Based Learner

Use a dense retriever model to perform non-taxonomic relation extraction:

from ontolearner import AutoRetrieverLearner, AgrO, train_test_split, evaluation_report

# Load and extract ontology data
ontology = AgrO()
ontology.load()
ontological_data = ontology.extract()

# Split into train and test sets
train_data, test_data = train_test_split(ontological_data, test_size=0.2, random_state=42)

# Initialize and load a retriever-based learner
task = 'non-taxonomic-re'
ret_learner = AutoRetrieverLearner(top_k=5)
ret_learner.load(model_id='sentence-transformers/all-MiniLM-L6-v2')

# Fit on training data and predict on test data
ret_learner.fit(train_data, task=task)
predicts = ret_learner.predict(test_data, task=task)

# Evaluate predictions
truth = ret_learner.tasks_ground_truth_former(data=test_data, task=task)
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
print(metrics)

Other available learners:


LearnerPipeline

LearnerPipeline consolidates the entire workflow — initialization, training, prediction, and evaluation — into a single call:

from ontolearner import LearnerPipeline, AgrO, train_test_split

# Load ontology and extract data
ontology = AgrO()
ontology.load()

train_data, test_data = train_test_split(
    ontology.extract(),
    test_size=0.2,
    random_state=42
)

# Initialize the pipeline with a dense retriever
pipeline = LearnerPipeline(
    retriever_id='sentence-transformers/all-MiniLM-L6-v2',
    batch_size=10,
    top_k=5
)

# Run: fit → predict → evaluate
outputs = pipeline(
    train_data=train_data,
    test_data=test_data,
    evaluate=True,
    task='non-taxonomic-re'
)

print("Metrics:", outputs['metrics'])
print("Elapsed time:", outputs['elapsed_time'])

⭐ Contribution

We welcome contributions of all kinds — bug reports, new features, documentation improvements, or new ontologies!

Please review our guidelines before getting started:

For bugs or questions, please open an issue in the GitHub Issue Tracker.


💡 Acknowledgements

If OntoLearner is useful in your research or work, please consider citing one of our publications:

@inproceedings{babaei2023llms4ol,
  title     = {LLMs4OL: Large Language Models for Ontology Learning},
  author    = {Babaei Giglou, Hamed and D'Souza, Jennifer and Auer, S{\"o}ren},
  booktitle = {International Semantic Web Conference},
  pages     = {408--427},
  year      = {2023},
  organization = {Springer}
}
@software{babaei_giglou_2025_15399783,
  author    = {Babaei Giglou, Hamed and D'Souza, Jennifer and Aioanei, Andrei
               and Mihindukulasooriya, Nandana and Auer, Sören},
  title     = {OntoLearner: A Modular Python Library for Ontology Learning with LLMs},
  month     = may,
  year      = 2025,
  publisher = {Zenodo},
  version   = {v1.3.0},
  doi       = {10.5281/zenodo.15399783},
  url       = {https://doi.org/10.5281/zenodo.15399783}
}

This software is archived on Zenodo under DOI and is licensed under License: MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontolearner-1.6.0.tar.gz (508.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontolearner-1.6.0-py3-none-any.whl (237.7 kB view details)

Uploaded Python 3

File details

Details for the file ontolearner-1.6.0.tar.gz.

File metadata

  • Download URL: ontolearner-1.6.0.tar.gz
  • Upload date:
  • Size: 508.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.0 CPython/3.10.20 Linux/6.17.0-1010-azure

File hashes

Hashes for ontolearner-1.6.0.tar.gz
Algorithm Hash digest
SHA256 53e73e6498f2134f15c83128d595941948a3e42ed050d82b1d2780e87ba11c02
MD5 cc789459f058c33ea093f6322adbefe3
BLAKE2b-256 60e4cd72448e412f4f21c9f50dcffa46f73c1ea49dbb1893379abf4f7e998c4b

See more details on using hashes here.

File details

Details for the file ontolearner-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: ontolearner-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 237.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.0 CPython/3.10.20 Linux/6.17.0-1010-azure

File hashes

Hashes for ontolearner-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 27eccf71cb3bcf99470719fc715329ecac2d4123c8bc407020501678d900c05c
MD5 b1138d003faa10b87075e2ba45a2631d
BLAKE2b-256 9bb19d77ff482412de0a24f7f5e05ed8a1c3471c0226b9ca826792ce25fb209d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page