OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment.
Project description
OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment
OntoAligner is a Python library designed to simplify ontology alignment and matching for researchers, practitioners, and developers. With a modular architecture and robust features, OntoAligner provides powerful tools to bridge ontologies effectively.
🧪 Installation
You can install OntoAligner from PyPI using pip:
pip install ontoaligner
Alternatively, to get the latest version directly from the source, use the following commands:
git clone git@github.com:sciknoworg/OntoAligner.git
pip install ./ontoaligner
📚 Documentation
Comprehensive documentation for OntoAligner, including detailed guides and examples, is available at ontoaligner.readthedocs.io. Below are some key tutorials with links to both the documentation and the corresponding example codes.
| Example | Tutorial | Script |
|---|---|---|
| Lightweight | 📚 Fuzzy Matching | 📝 Code |
| Retrieval | 📚 Retrieval Aligner | 📝 Code |
| Large Language Models | 📚 Large Language Models Aligner | 📝 Code |
| Retrieval Augmented Generation | 📚 Retrieval Augmented Generation | 📝 Code |
| FewShot | 📚 FewShot RAG | 📝 Code |
| In-Context Vectors Learning | 📚 In-Context Vectors RAG | 📝 Code |
| eCommerce | 📚 Product Alignment in eCommerce | 📝 Code |
🚀 Quick Tour
Below is an example of using Retrieval-Augmented Generation (RAG) step-by-step approach for ontology matching:
from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
from ontoaligner.utils import metrics, xmlify
from ontoaligner.aligner import MistralLLMBERTRetrieverRAG
from ontoaligner.encoder import ConceptParentRAGEncoder
from ontoaligner.postprocess import rag_hybrid_postprocessor
# Step 1: Initialize the dataset object for MaterialInformation MatOnto dataset
task = MaterialInformationMatOntoOMDataset()
print("Test Task:", task)
# Step 2: Load source and target ontologies along with reference matchings
dataset = task.collect(
source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
reference_matching_path="assets/MI-MatOnto/matchings.xml"
)
# Step 3: Encode the source and target ontologies
encoder_model = ConceptParentRAGEncoder()
encoded_ontology = encoder_model(source=dataset['source'], target=dataset['target'])
# Step 4: Define configuration for retriever and LLM
retriever_config = {"device": 'cuda', "top_k": 5,}
llm_config = {"device": "cuda", "max_length": 300, "max_new_tokens": 10, "batch_size": 15}
# Step 5: Initialize Generate predictions using RAG-based ontology matcher
model = MistralLLMBERTRetrieverRAG(retriever_config=retriever_config, llm_config=llm_config)
predicts = model.generate(input_data=encoded_ontology)
# Step 6: Apply hybrid postprocessing
hybrid_matchings, hybrid_configs = rag_hybrid_postprocessor(predicts=predicts,
ir_score_threshold=0.1,
llm_confidence_th=0.8)
evaluation = metrics.evaluation_report(predicts=hybrid_matchings, references=dataset['reference'])
print("Hybrid Matching Evaluation Report:", evaluation)
# Step 7: Convert matchings to XML format and save the XML representation
xml_str = xmlify.xml_alignment_generator(matchings=hybrid_matchings)
open("matchings.xml", "w", encoding="utf-8").write(xml_str)
Ontology alignment pipeline using RAG method:
import ontoaligner
pipeline = ontoaligner.OntoAlignerPipeline(
task_class=ontoaligner.ontology.MouseHumanOMDataset,
source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
reference_matching_path="assets/MI-MatOnto/matchings.xml",
)
matchings, evaluation = pipeline(
method="rag",
encoder_model=ontoaligner.encoder.ConceptRAGEncoder(),
model_class=ontoaligner.aligner.MistralLLMBERTRetrieverRAG,
postprocessor=ontoaligner.postprocess.rag_hybrid_postprocessor,
llm_path='mistralai/Mistral-7B-v0.3',
retriever_path='all-MiniLM-L6-v2',
llm_threshold=0.5,
ir_rag_threshold=0.7,
top_k=5,
max_length=512,
max_new_tokens=10,
device='cuda',
batch_size=32,
return_matching=True,
evaluate=True
)
print("Matching Evaluation Report:", evaluation)
⭐ Contribution
We welcome contributions to enhance OntoAligner and make it even better! Please review our contribution guidelines in CONTRIBUTING.md before getting started. You are also welcome to assist with the ongoing maintenance by referring to MAINTENANCE.md. Your support is greatly appreciated.
If you encounter any issues or have questions, please submit them in the GitHub issues tracker.
💡 Acknowledgements
If you use OntoAligner in your work or research, please cite the following preprint:
@article{giglou2025ontoaligner,
title={Ontoaligner: A comprehensive modular and robust python toolkit for ontology alignment},
author={Giglou, Hamed Babaei and D'Souza, Jennifer and Karras, Oliver and Auer, S{\"o}ren},
journal={arXiv preprint arXiv:2503.21902},
year={2025}
}
This software is archived in Zenodo under the DOI and is licensed under
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ontoaligner-1.4.0.tar.gz.
File metadata
- Download URL: ontoaligner-1.4.0.tar.gz
- Upload date:
- Size: 118.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1014-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cc1ef60311e8f02478b7c8e31f9f4ce640b06ec5845af3a87a2fb3734e4e617
|
|
| MD5 |
376405102eb1ac80a220e180e7c84aef
|
|
| BLAKE2b-256 |
fcbff4711926b549815df13e78b18347548e238a8214e45a765cdf50b9effdbd
|
File details
Details for the file ontoaligner-1.4.0-py3-none-any.whl.
File metadata
- Download URL: ontoaligner-1.4.0-py3-none-any.whl
- Upload date:
- Size: 97.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1014-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d09b5e98cc88308ef7b4803f10acf59c93221cb2b098fbcc9d9b562dce38f654
|
|
| MD5 |
106b77a59ebe37a74db6910ee7d2b59c
|
|
| BLAKE2b-256 |
a2522ecc9dd7c2802edef00cd9bd1fe7d0800d784d702facefe464e1357521b7
|