Skip to main content

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment.

Project description

OntoAligner Logo

PyPI version PyPI Downloads License pre-commit Documentation Status Maintenance DOI

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

OntoAligner is a Python library designed to simplify ontology alignment and matching for researchers, practitioners, and developers. With a modular architecture and robust features, OntoAligner provides powerful tools to bridge ontologies effectively.

📘 Explore the OntoAligner tutorial here.

🧪 Installation

You can install OntoAligner from PyPI using pip:

pip install ontoaligner

Alternatively, to get the latest version directly from the source, use the following commands:

git clone git@github.com:sciknoworg/OntoAligner.git
pip install ./ontoaligner

Next, verify the installation:

import ontoaligner

print(ontoaligner.__version__)

📚 Documentation

Comprehensive documentation for OntoAligner, including detailed guides and examples, is available at ontoaligner.readthedocs.io. Below are some key tutorials with links to both the documentation and the corresponding example codes.

Example Tutorial Script
Lightweight 📚 Fuzzy Matching 📝 Code
Retrieval 📚 Retrieval Aligner 📝 Code
Large Language Models 📚 LLM Aligner 📝 Code
Retrieval Augmented Generation 📚 RAG Aligner 📝 Code
FewShot 📚 FewShot-RAG Aligner 📝 Code
In-Context Vectors Learning 📚 In-Context Vectors RAG 📝 Code
Knowledge Graph Embedding 📚 KGE Aligner 📝 Code
Property Alignment 📚 PropMatch Aligner 📝 Code
eCommerce 📚 Product Alignment in eCommerce 📝 Code

🚀 Quick Tour

Below is an example of using Retrieval-Augmented Generation (RAG) step-by-step approach for ontology matching:

from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
from ontoaligner.utils import metrics, xmlify
from ontoaligner.aligner import MistralLLMBERTRetrieverRAG
from ontoaligner.encoder import ConceptParentRAGEncoder
from ontoaligner.postprocess import rag_hybrid_postprocessor

# Step 1: Initialize the dataset object for MaterialInformation MatOnto dataset
task = MaterialInformationMatOntoOMDataset()
print("Test Task:", task)

# Step 2: Load source and target ontologies along with reference matchings
dataset = task.collect(
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml"
)

# Step 3: Encode the source and target ontologies
encoder_model = ConceptParentRAGEncoder()
encoded_ontology = encoder_model(source=dataset['source'], target=dataset['target'])

# Step 4: Define configuration for retriever and LLM
retriever_config = {"device": 'cuda', "top_k": 5}
llm_config = {"device": "cuda", "max_length": 300, "max_new_tokens": 10, "batch_size": 15}

# Step 5: Initialize Generate predictions using RAG-based ontology matcher
model = MistralLLMBERTRetrieverRAG(retriever_config=retriever_config, llm_config=llm_config)
model.load(llm_path = "mistralai/Mistral-7B-v0.3", ir_path="all-MiniLM-L6-v2")
predicts = model.generate(input_data=encoded_ontology)

# Step 6: Apply hybrid postprocessing
hybrid_matchings, hybrid_configs = rag_hybrid_postprocessor(predicts=predicts,
                                                            ir_score_threshold=0.1,
                                                            llm_confidence_th=0.8)

evaluation = metrics.evaluation_report(predicts=hybrid_matchings, references=dataset['reference'])
print("Hybrid Matching Evaluation Report:", evaluation)

# Step 7: Convert matchings to XML format and save the XML representation
xml_str = xmlify.xml_alignment_generator(matchings=hybrid_matchings)
open("matchings.xml", "w", encoding="utf-8").write(xml_str)

Ontology alignment pipeline using RAG method:

import ontoaligner

pipeline = ontoaligner.OntoAlignerPipeline(
    task_class=ontoaligner.ontology.MouseHumanOMDataset,
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml"
)

matchings, evaluation = pipeline(
    method="rag",
    encoder_model=ontoaligner.encoder.ConceptRAGEncoder(),
    model_class=ontoaligner.aligner.MistralLLMBERTRetrieverRAG,
    postprocessor=ontoaligner.postprocess.rag_hybrid_postprocessor,
    llm_path='mistralai/Mistral-7B-v0.3',
    retriever_path='all-MiniLM-L6-v2',
    llm_threshold=0.5,
    ir_rag_threshold=0.7,
    top_k=5,
    max_length=512,
    max_new_tokens=10,
    device='cuda',
    batch_size=32,
    return_matching=True,
    evaluate=True
)

print("Matching Evaluation Report:", evaluation)

👥 Contact & Contributions

We welcome contributions to enhance OntoAligner and make it even better! Please review our contribution guidelines in CONTRIBUTING.md before getting started. You are also welcome to assist with the ongoing maintenance by referring to MAINTENANCE.md. Your support is greatly appreciated.

If you encounter any issues or have questions, please submit them in the GitHub issues tracker.

📚 Citing this Work

If you use OntoAligner in your work or research, please cite the following preprint:

  • OntoAligner Library:

    Babaei Giglou, H., D’Souza, J., Karras, O., Auer, S. (2025). OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment. In: Curry, E., et al. The Semantic Web. ESWC 2025. Lecture Notes in Computer Science, vol 15719. Springer, Cham. https://doi.org/10.1007/978-3-031-94578-6_10

    📌 BibTeX

    @InProceedings{10.1007/978-3-031-94578-6_10,
        author="Babaei Giglou, Hamed and D'Souza, Jennifer and Karras, Oliver and Auer, S{\"o}ren",
        editor="Curry, Edward and Acosta, Maribel and Poveda-Villal{\'o}n, Maria and van Erp, Marieke and Ojo, Adegboyega and Hose, Katja and Shimizu, Cogan and Lisena, Pasquale",
        title="OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment",
        booktitle="The Semantic Web",
        year="2025",
        publisher="Springer Nature Switzerland",
        address="Cham",
        pages="174--191"
    }
    
  • LLMs4OM (for RAG module)

    Babaei Giglou, H., D’Souza, J., Engel, F., Auer, S. (2025). LLMs4OM: Matching Ontologies with Large Language Models. In: Meroño Peñuela, A., et al. The Semantic Web: ESWC 2024 Satellite Events. ESWC 2024. Lecture Notes in Computer Science, vol 15344. Springer, Cham. https://doi.org/10.1007/978-3-031-78952-6_3

    📌 BibTeX

    @InProceedings{10.1007/978-3-031-78952-6_3,
      author="Babaei Giglou, Hamed and D'Souza, Jennifer and Engel, Felix and Auer, S{\"o}ren",
      editor="Mero{\~{n}}o Pe{\~{n}}uela, Albert and Corcho, Oscar and Groth, Paul and Simperl, Elena and Tamma, Valentina and Nuzzolese, Andrea Giovanni and Poveda-Villal{\'o}n, Maria and Sabou, Marta and Presutti, Valentina and Celino, Irene and Revenko, Artem and Raad, Joe and Sartini, Bruno and Lisena, Pasquale",
      title="LLMs4OM: Matching Ontologies with Large Language Models",
      booktitle="The Semantic Web: ESWC 2024 Satellite Events",
      year="2025",
      publisher="Springer Nature Switzerland",
      address="Cham",
      pages="25--35",
      isbn="978-3-031-78952-6"
      }
    

📃 License

This software is licensed under License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontoaligner-1.7.0.tar.gz (175.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontoaligner-1.7.0-py3-none-any.whl (182.5 kB view details)

Uploaded Python 3

File details

Details for the file ontoaligner-1.7.0.tar.gz.

File metadata

  • Download URL: ontoaligner-1.7.0.tar.gz
  • Upload date:
  • Size: 175.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.10.20 Linux/6.17.0-1013-azure

File hashes

Hashes for ontoaligner-1.7.0.tar.gz
Algorithm Hash digest
SHA256 cce8e7c2a80f32d1e183a708002a97c2b2a74ff08da9c40abed8cfeb89124856
MD5 f62acf35b4619b277a0ed3b4e4287b1c
BLAKE2b-256 e4cfb7ac4a4483c12233f947bd42a394ea5536a04c36835ffcd824f92f625c12

See more details on using hashes here.

File details

Details for the file ontoaligner-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: ontoaligner-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 182.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.10.20 Linux/6.17.0-1013-azure

File hashes

Hashes for ontoaligner-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91dc6ef3948c21f500fb27d57bb2aceff9ac6824c296884686c4def6b194bc6f
MD5 217bb6aa4a040a3013aa074467aa7bd9
BLAKE2b-256 20cf7661bcd2d5536ce3389275986048d944a810cb6cf9e8d56207923d92735b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page