OntoAligner

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment.

These details have not been verified by PyPI

Project links

Project description

License

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

OntoAligner is a Python library that makes ontology alignment and knowledge graph matching easy for researchers, practitioners, and developers. It ships a single, consistent parse → encode → align → postprocess pipeline behind more than a dozen alignment paradigms — from classic fuzzy/lexical matching to retrieval, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Knowledge Graph Embeddings (KGE), fuzzy-logic KG alignment, and ensemble learning — so you can go from two raw ontologies to an evaluated alignment in a handful of lines of code.

🏆 OntoAligner was awarded the Best Resource Paper Award at ESWC 2025.

📘 New to OntoAligner? Start with the tutorial notebooks or the full documentation.

🧪 Installation

You can install OntoAligner from PyPI using pip:

pip install ontoaligner

Alternatively, to get the latest version directly from the source, use the following commands:

git clone git@github.com:sciknoworg/OntoAligner.git
pip install ./ontoaligner

Next, verify the installation:

import ontoaligner

print(ontoaligner.__version__)

🚀 Quick Tour

End-to-end with `OntoAlignerPipeline`

The fastest way to run an alignment: pick a dataset, an encoder, and an aligner, and let the pipeline handle collection, encoding, prediction, postprocessing, and evaluation.

import ontoaligner

pipeline = ontoaligner.OntoAlignerPipeline(
    task_class=ontoaligner.ontology.MouseHumanOMDataset,
    source_ontology_path="assets/mouse-human/source.xml",
    target_ontology_path="assets/mouse-human/target.xml",
    reference_matching_path="assets/mouse-human/reference.xml",
)

matchings, evaluation = pipeline(
    method="rag",
    encoder_model=ontoaligner.encoder.ConceptParentRAGEncoder(),
    model_class=ontoaligner.aligner.MistralLLMBERTRetrieverRAG,
    postprocessor=ontoaligner.postprocess.rag_hybrid_postprocessor,
    llm_path="mistralai/Mistral-7B-v0.3",
    retriever_path="all-MiniLM-L6-v2",
    llm_threshold=0.5,
    ir_rag_threshold=0.7,
    top_k=5,
    max_length=512,
    max_new_tokens=10,
    device="cuda",
    batch_size=32,
    return_matching=True,
    evaluate=True,
)

print("Matching Evaluation Report:", evaluation)

Step-by-step, low-level control

Build the same RAG-based alignment yourself for full control over every stage:

from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
from ontoaligner.utils import metrics, xmlify
from ontoaligner.aligner import MistralLLMBERTRetrieverRAG
from ontoaligner.encoder import ConceptParentRAGEncoder
from ontoaligner.postprocess import rag_hybrid_postprocessor

# Step 1: Initialize the dataset object for the MaterialInformation MatOnto dataset
task = MaterialInformationMatOntoOMDataset()
print("Test Task:", task)

# Step 2: Load source and target ontologies along with reference matchings
dataset = task.collect(
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml",
)

# Step 3: Encode the source and target ontologies
encoder_model = ConceptParentRAGEncoder()
encoded_ontology = encoder_model(source=dataset["source"], target=dataset["target"])

# Step 4: Define configuration for retriever and LLM
retriever_config = {"device": "cuda", "top_k": 5}
llm_config = {"device": "cuda", "max_length": 300, "max_new_tokens": 10, "batch_size": 15}

# Step 5: Generate predictions using the RAG-based ontology matcher
model = MistralLLMBERTRetrieverRAG(retriever_config=retriever_config, llm_config=llm_config)
model.load(llm_path="mistralai/Mistral-7B-v0.3", ir_path="all-MiniLM-L6-v2")
predicts = model.generate(input_data=encoded_ontology)

# Step 6: Apply hybrid postprocessing
hybrid_matchings, hybrid_configs = rag_hybrid_postprocessor(
    predicts=predicts, ir_score_threshold=0.1, llm_confidence_th=0.8
)

evaluation = metrics.evaluation_report(predicts=hybrid_matchings, references=dataset["reference"])
print("Hybrid Matching Evaluation Report:", evaluation)

# Step 7: Convert matchings to XML format and save the XML representation
xml_str = xmlify.xml_alignment_generator(matchings=hybrid_matchings)
open("matchings.xml", "w", encoding="utf-8").write(xml_str)

Advanced `AlignerPipeline`

AlignerPipeline provides a reusable execution flow for running one user-provided encoder and one ontology matching aligner over a collected ontology matching dataset. See the bellow on how to define advanced aligner pipeline.

from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
from ontoaligner.utils import metrics
from ontoaligner.encoder import ConceptParentLightweightEncoder
from ontoaligner.aligner import SimpleFuzzySMLightweight
from ontoaligner import AlignerPipeline

task = MaterialInformationMatOntoOMDataset()

dataset = task.collect(
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml",
)

aligner_pipeline = AlignerPipeline(
    encoder=ConceptParentLightweightEncoder(),
    aligner=SimpleFuzzySMLightweight(fuzzy_sm_threshold=0.2),
    om_dataset=dataset,
)
matchings = aligner_pipeline.generate()

evaluation = metrics.evaluation_report(predicts=matchings, references=dataset["reference"])
print("Matching Evaluation Report:", evaluation)

Fusing multiple aligners with Ensemble Learning with `AlignerPipeline`

Combine independent aligner branches (lexical, retrieval, KGE, LLM, RAG, ...) into a single, more robust alignment via a voting strategy such as WeightedVoting, BordaVoting, CondorcetVoting, or ReciprocalRankFusionVoting:

from ontoaligner.aligner.ensemble import EnsembleLearningAligner
from ontoaligner.aligner.ensemble.voting import WeightedVoting
from ontoaligner import AlignerPipeline

lexical_pipeline = AlignerPipeline(...)  # define your lexical aligner pipeline
retrieval_pipeline = AlignerPipeline(...)  # define your retrieval aligner pipeline
llm_pipeline = AlignerPipeline(...)  # define your LLM aligner pipeline

ensemble = EnsembleLearningAligner(
    aligners=[
        ("lexical", lexical_pipeline, 0.2),   # each branch is an AlignerPipeline
        ("retrieval", retrieval_pipeline, 0.3),
        ("llm", llm_pipeline, 0.5),
    ],
    voting=WeightedVoting(),
)

final_matchings = ensemble.generate()

See ontoaligner.readthedocs.io/developerguide/pipeline.html for more details on how to define your own AlignerPipeline and EnsembleLearningAligner.

👉 More end-to-end scripts are available in examples/, including aligner_pipeline.py, ensemble.py, flora.py, olala.py, and many more.

📚 Documentation & Tutorials

Comprehensive documentation, including detailed guides and examples, is available at ontoaligner.readthedocs.io. Below are key tutorials with links to both the documentation and the corresponding example scripts.

Example	Tutorial	Script
Lightweight	📚 Fuzzy Matching	📝 Code
Retrieval	📚 Retrieval Aligner	📝 Code
Large Language Models	📚 LLM Aligner	📝 Code
Retrieval Augmented Generation	📚 RAG Aligner	📝 Code
FewShot	📚 FewShot-RAG Aligner	📝 Code
In-Context Vectors Learning	📚 In-Context Vectors RAG	📝 Code
Knowledge Graph Embedding	📚 KGE Aligner	📝 Code
Property Alignment	📚 PropMatch Aligner	📝 Code
FLORA (Knowledge Graphs)	📚 FLORA Aligner	📝 Code
OLaLa	📚 OLaLa Aligner	📝 Code
Ensemble Learning	📚 Ensemble Learning	📝 Code
eCommerce	📚 Product Alignment in eCommerce	📝 Code
Financial Corporate Actions	📚 FIBO Corporate Actions Alignment	📝 Code

👥 Contact & Contributions

We welcome contributions to enhance OntoAligner and make it even better! Please review our contribution guidelines in CONTRIBUTING.md before getting started. You are also welcome to assist with the ongoing maintenance by referring to MAINTENANCE.md. Your support is greatly appreciated.

If you encounter any issues or have questions, please submit them in the GitHub issues tracker.

📚 Citing this Work

If you use OntoAligner in your work or research, please cite the following preprint:

OntoAligner Library:

Babaei Giglou, H., D'Souza, J., Karras, O., Auer, S. (2025). OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment. In: Curry, E., et al. The Semantic Web. ESWC 2025. Lecture Notes in Computer Science, vol 15719. Springer, Cham. https://doi.org/10.1007/978-3-031-94578-6_10

📌 BibTeX

@InProceedings{10.1007/978-3-031-94578-6_10,
    author="Babaei Giglou, Hamed and D'Souza, Jennifer and Karras, Oliver and Auer, S{\"o}ren",
    editor="Curry, Edward and Acosta, Maribel and Poveda-Villal{\'o}n, Maria and van Erp, Marieke and Ojo, Adegboyega and Hose, Katja and Shimizu, Cogan and Lisena, Pasquale",
    title="OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment",
    booktitle="The Semantic Web",
    year="2025",
    publisher="Springer Nature Switzerland",
    address="Cham",
    pages="174--191"
}

LLMs4OM (for RAG module)

Babaei Giglou, H., D'Souza, J., Engel, F., Auer, S. (2025). LLMs4OM: Matching Ontologies with Large Language Models. In: Meroño Peñuela, A., et al. The Semantic Web: ESWC 2024 Satellite Events. ESWC 2024. Lecture Notes in Computer Science, vol 15344. Springer, Cham. https://doi.org/10.1007/978-3-031-78952-6_3

📌 BibTeX

@InProceedings{10.1007/978-3-031-78952-6_3,
  author="Babaei Giglou, Hamed and D'Souza, Jennifer and Engel, Felix and Auer, S{\"o}ren",
  editor="Mero{\~{n}}o Pe{\~{n}}uela, Albert and Corcho, Oscar and Groth, Paul and Simperl, Elena and Tamma, Valentina and Nuzzolese, Andrea Giovanni and Poveda-Villal{\'o}n, Maria and Sabou, Marta and Presutti, Valentina and Celino, Irene and Revenko, Artem and Raad, Joe and Sartini, Bruno and Lisena, Pasquale",
  title="LLMs4OM: Matching Ontologies with Large Language Models",
  booktitle="The Semantic Web: ESWC 2024 Satellite Events",
  year="2025",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="25--35",
  isbn="978-3-031-78952-6"
  }

Knowledge Graph Embeddings based aligner

Giglou, Hamed Babaei, Jennifer D'Souza, Sören Auer, and Mahsa Sanaei. "OntoAligner Meets Knowledge Graph Embedding Aligners." arXiv preprint arXiv:2509.26417 (2025). https://arxiv.org/abs/2509.26417>

📌 BibTeX

@article{babaei2025ontoaligner,
  title={OntoAligner Meets Knowledge Graph Embedding Aligners},
  author={Babaei Giglou, Hamed and D'Souza, Jennifer and Auer, S{\"o}ren and Sanaei, Mahsa},
  journal={arXiv e-prints},
  pages={arXiv--2509},
  year={2025}
}

📃 License

This software is licensed under .

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.9.1

Jul 2, 2026

1.9.0

Jun 29, 2026

1.8.0

May 22, 2026

1.7.0

May 15, 2026

1.6.0

Jan 2, 2026

1.5.2

Oct 12, 2025

1.5.1

Sep 7, 2025

1.5.0

Jul 29, 2025

1.4.3

Jul 1, 2025

1.4.2

Jun 16, 2025

1.4.1

May 26, 2025

1.4.0

May 22, 2025

1.3.0

Mar 20, 2025

1.2.3

Feb 4, 2025

1.2.2

Dec 20, 2024

1.2.1

Dec 11, 2024

1.2.0

Dec 10, 2024

1.1.0

Dec 8, 2024

1.0.0

Dec 5, 2024

0.1.1

Oct 8, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontoaligner-1.9.1.tar.gz (198.5 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ontoaligner-1.9.1-py3-none-any.whl (221.0 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file ontoaligner-1.9.1.tar.gz.

File metadata

Download URL: ontoaligner-1.9.1.tar.gz
Upload date: Jul 2, 2026
Size: 198.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.10.20 Linux/6.17.0-1018-azure

File hashes

Hashes for ontoaligner-1.9.1.tar.gz
Algorithm	Hash digest
SHA256	`ecaae6546db1651c5d963ef23e5329a33b8283c6747fa51dab0e743d9112a15b`
MD5	`9d8cdc980622feb11ca111e0c6a48ab5`
BLAKE2b-256	`7fe376ac87060df14d7c432261f713882541ea097775f9a646f6028b655af598`

See more details on using hashes here.

File details

Details for the file ontoaligner-1.9.1-py3-none-any.whl.

File metadata

Download URL: ontoaligner-1.9.1-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 221.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.10.20 Linux/6.17.0-1018-azure

File hashes

Hashes for ontoaligner-1.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51c54e0be45cea346c0f7f14219b9e52c23d959c93fb646a97059cc14589ade3`
MD5	`38dfc435f2686fc0b4e8438622a4800c`
BLAKE2b-256	`cc4a171265e9a5057e9905a2a7e9bf0bc396a7fbb6399c5ee157722fab10c77b`

See more details on using hashes here.

OntoAligner 1.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

🧪 Installation

🚀 Quick Tour

End-to-end with `OntoAlignerPipeline`

Step-by-step, low-level control

Advanced `AlignerPipeline`

Fusing multiple aligners with Ensemble Learning with `AlignerPipeline`

📚 Documentation & Tutorials

👥 Contact & Contributions

📚 Citing this Work

📃 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

OntoAligner 1.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

🧪 Installation

🚀 Quick Tour

End-to-end with OntoAlignerPipeline

Step-by-step, low-level control

Advanced AlignerPipeline

Fusing multiple aligners with Ensemble Learning with AlignerPipeline

📚 Documentation & Tutorials

👥 Contact & Contributions

📚 Citing this Work

📃 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

End-to-end with `OntoAlignerPipeline`

Advanced `AlignerPipeline`

Fusing multiple aligners with Ensemble Learning with `AlignerPipeline`