Skip to main content

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment.

Project description

OntoAligner Logo

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

PyPI version PyPI Downloads License pre-commit Documentation Status Maintenance

OntoAligner is a Python library designed to simplify ontology alignment and matching for researchers, practitioners, and developers. With a modular architecture and robust features, OntoAligner provides powerful tools to bridge ontologies effectively.

🧪 Installation

You can install OntoAligner from PyPI using pip:

pip install ontoaligner

Alternatively, to get the latest version directly from the source, use the following commands:

git clone git@github.com:sciknoworg/OntoAligner.git
pip install ./ontoaligner

📚 Documentation

Comprehensive documentation for OntoAligner, including detailed guides and examples, is available at ontoaligner.readthedocs.io. Below are some key tutorials with links to both the documentation and the corresponding example codes.

Example Tutorial Script
Lightweight 📚 Fuzzy Matching 📝 Code
Retrieval 📚 Retrieval Aligner 📝 Code
Large Language Models 📚 Large Language Models Aligner 📝 Code
Retrieval Augmented Generation 📚 Retrieval Augmented Generation 📝 Code
FewShot 📚 FewShot RAG 📝 Code
In-Context Vectors Learning 📚 In-Context Vectors RAG 📝 Code
eCommerce 📚 Product Alignment in eCommerce 📝 Code

🚀 Quick Tour

Below is an example of using Retrieval-Augmented Generation (RAG) step-by-step approach for ontology matching:

from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
from ontoaligner.utils import metrics, xmlify
from ontoaligner.aligner import MistralLLMBERTRetrieverRAG
from ontoaligner.encoder import ConceptParentRAGEncoder
from ontoaligner.postprocess import rag_hybrid_postprocessor

# Step 1: Initialize the dataset object for MaterialInformation MatOnto dataset
task = MaterialInformationMatOntoOMDataset()
print("Test Task:", task)

# Step 2: Load source and target ontologies along with reference matchings
dataset = task.collect(
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml"
)

# Step 3: Encode the source and target ontologies
encoder_model = ConceptParentRAGEncoder()
encoded_ontology = encoder_model(source=dataset['source'], target=dataset['target'])

# Step 4: Define configuration for retriever and LLM
retriever_config = {"device": 'cuda', "top_k": 5,}
llm_config = {"device": "cuda", "max_length": 300, "max_new_tokens": 10, "batch_size": 15}

# Step 5: Initialize Generate predictions using RAG-based ontology matcher
model = MistralLLMBERTRetrieverRAG(retriever_config=retriever_config, llm_config=llm_config)
predicts = model.generate(input_data=encoded_ontology)

# Step 6: Apply hybrid postprocessing
hybrid_matchings, hybrid_configs = rag_hybrid_postprocessor(predicts=predicts,
                                                            ir_score_threshold=0.1,
                                                            llm_confidence_th=0.8)

evaluation = metrics.evaluation_report(predicts=hybrid_matchings, references=dataset['reference'])
print("Hybrid Matching Evaluation Report:", evaluation)

# Step 7: Convert matchings to XML format and save the XML representation
xml_str = xmlify.xml_alignment_generator(matchings=hybrid_matchings)
open("matchings.xml", "w", encoding="utf-8").write(xml_str)

Ontology alignment pipeline using RAG method:

import ontoaligner

pipeline = ontoaligner.OntoAlignerPipeline(
    task_class=ontoaligner.ontology.MouseHumanOMDataset,
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml",
)

matchings, evaluation = pipeline(
    method="rag",
    encoder_model=ontoaligner.encoder.ConceptRAGEncoder(),
    model_class=ontoaligner.aligner.MistralLLMBERTRetrieverRAG,
    postprocessor=ontoaligner.postprocess.rag_hybrid_postprocessor,
    llm_path='mistralai/Mistral-7B-v0.3',
    retriever_path='all-MiniLM-L6-v2',
    llm_threshold=0.5,
    ir_rag_threshold=0.7,
    top_k=5,
    max_length=512,
    max_new_tokens=10,
    device='cuda',
    batch_size=32,
    return_matching=True,
    evaluate=True
)

print("Matching Evaluation Report:", evaluation)

⭐ Contribution

We welcome contributions to enhance OntoAligner and make it even better! Please review our contribution guidelines in CONTRIBUTING.md before getting started. You are also welcome to assist with the ongoing maintenance by referring to MAINTENANCE.md. Your support is greatly appreciated.

If you encounter any issues or have questions, please submit them in the GitHub issues tracker.

💡 Acknowledgements

If you use OntoAligner in your work or research, please cite the following preprint:

@article{giglou2025ontoaligner,
  title={Ontoaligner: A comprehensive modular and robust python toolkit for ontology alignment},
  author={Giglou, Hamed Babaei and D'Souza, Jennifer and Karras, Oliver and Auer, S{\"o}ren},
  journal={arXiv preprint arXiv:2503.21902},
  year={2025}
}

This software is archived in Zenodo under the DOI DOI and is licensed under License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontoaligner-1.4.1.tar.gz (118.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontoaligner-1.4.1-py3-none-any.whl (96.8 kB view details)

Uploaded Python 3

File details

Details for the file ontoaligner-1.4.1.tar.gz.

File metadata

  • Download URL: ontoaligner-1.4.1.tar.gz
  • Upload date:
  • Size: 118.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1014-azure

File hashes

Hashes for ontoaligner-1.4.1.tar.gz
Algorithm Hash digest
SHA256 15f6dfce5315f97cb63aa83bdf25c7514656b98d8cd6a6a4fe60816c9e40db34
MD5 962e9ffe299aa7b68d349fac52326775
BLAKE2b-256 918400d1c270efae0b156d19c03ecefb7520059db5e1b0bb03cae4e7d43b86f6

See more details on using hashes here.

File details

Details for the file ontoaligner-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: ontoaligner-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 96.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1014-azure

File hashes

Hashes for ontoaligner-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a05f293ccc4fe4206ad3554178b7228f5f49b80c8f0f566fdfc8f697ed158c7a
MD5 6985c2cca58fa9505f26aa64406c672f
BLAKE2b-256 a184b6ffcfc5c76cc4daceb38ad4050a8af8f637d22f3166fbd067cfd8226aed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page