Skip to main content

End-to-end ontology embedding via fine-tuning sentence transformers with hyperbolic geometry and role-based rotation for existential restrictions.

Project description

Ontology-Transformer

End-to-end ontology embedding via fine-tuning sentence transformers with hyperbolic geometry and role-based rotation for existential restrictions (∃r.C).

License Python 3.9+

Features

  • One-line training: OntologyTransformer.fit("ontology.owl") → fine-tuned embeddings
  • Hyperbolic space: Poincaré ball embeddings for hierarchical structures
  • Role-aware existential restrictions: ∃r.C encoded via learned rotation transformations
  • Automatic data preparation: Converts OWL/OFN axioms to training data (no manual preprocessing)
  • Best lambda auto-tuning: Centripetal weight optimized on evaluation data and saved with model
  • Flexible evaluation: Use training ontology samples or separate eval/test ontologies

Installation

From PyPI (when published)

pip install ontology-transformer

From source

git clone https://github.com/your-username/ont-embed.git
cd ont-embed
pip install -e .

Requirements

  • Python ≥ 3.9
  • PyTorch ≥ 2.0 (with CUDA recommended)
  • sentence-transformers, geoopt, deeponto, datasets

Quick Start

1. End-to-end: OWL → Fine-tune → Embeddings

from ont import OntologyTransformer

# Train on any OWL/OFN ontology (all axioms used for training)
model = OntologyTransformer.fit(
    owl_path="path/to/ontology.owl",
    output_dir="./output",
    num_epochs=3,
    batch_size=64,        # training batch size (sentences per step)
    eval_batch_size=32,   # evaluation batch size (queries scored per step)
    eval_ratio=0.1,       # 10% of axioms sampled for evaluation
    max_eval=1000,        # max 1000 eval samples
)

# The best lambda (centripetal weight) is determined during training
print(f"Best lambda: {model.best_lambda}")

# Encode concepts
emb = model.encode("food product")

# Encode ∃r.C (existential restrictions) via role rotation
exist_emb = model.encode_existence("has ingredient", "sugar")

2. Use separate ontologies for evaluation/testing

model = OntologyTransformer.fit(
    owl_path="train_ontology.owl",
    eval_owl_path="eval_ontology.owl",   # optional: separate eval ontology
    test_owl_path="test_ontology.owl",   # optional: separate test ontology
    output_dir="./output",
    num_epochs=3,
)

3. Load a pre-trained model

from ont import OntologyTransformer

# Load model (best_lambda is automatically restored)
model = OntologyTransformer.from_pretrained("./output/final")
print(f"Loaded best_lambda: {model.best_lambda}")

# Encode
emb = model.encode("heart disease")
exist_emb = model.encode_existence("has part", "cell membrane")

4. CLI

# Basic training
ont-train --owl ontology.owl --output ./output --epochs 3

# With explicit batch sizes
ont-train --owl ontology.owl --output ./output \
    --epochs 20 --batch-size 256 --eval-batch-size 64

# With separate eval ontology
ont-train --owl train.owl --eval-owl eval.owl --output ./output --epochs 3

# Balanced mode (adds C_neg contrastive loss)
ont-train --owl ontology.owl --output ./output --balanced --epochs 3

Key training parameters

Parameter (Python) CLI flag Default Description
num_epochs --epochs 1 Number of training epochs
batch_size --batch-size 64 Sentences per training step. Increase for larger GPUs (e.g. 256 on 40+ GB).
eval_batch_size --eval-batch-size 32 Queries scored per evaluation step. Increase to speed up evaluation when GPU memory allows.
learning_rate --lr 1e-5 Learning rate
balanced --balanced False Add C_neg contrastive loss for existential restrictions
balanced_negatives --balanced-negatives 1 Number of negative samples in balanced mode
eval_ratio 0.1 Fraction of axioms sampled for eval (Python API only)
max_eval 1000 Max number of eval samples (Python API only)

Data Preparation Flow

By default (no separate eval/test ontologies):

  1. All axioms from input ontology → training data (train.jsonl, train_exist.jsonl, train_conj.jsonl)
  2. 10% of axioms (max 1000) randomly sampled → evaluation data (val.json)
  3. No test split created (unless test_owl_path is provided)

With external eval/test ontologies:

  • eval_owl_path: evaluation data prepared from this ontology
  • test_owl_path: test evaluation performed after training

This design ensures all available training data is used while still enabling hyperparameter tuning (best lambda) via evaluation.

Training Modes

Non-balanced (default)

Standard contrastive loss on taxonomy + existential axioms:

  • Clustering loss: push child closer to parent
  • Centripetal loss: pull child away from non-ancestors
  • Conjunction loss: C₁ ⊓ C₂ ⊑ D
  • Existential loss: ∃r.C encoded via rotation

Balanced

Adds extra contrastive loss with negative concept samples (C_neg) for existential restrictions:

model = OntologyTransformer.fit(
    owl_path="ontology.owl",
    balanced=True,
    balanced_negatives=5,  # number of negative samples
)

Architecture

  • Base model: SentenceTransformer fine-tuned in Poincaré ball (hyperbolic space)
  • Role model: Linear layer mapping role embeddings to rotation angles (rotation or transition mode)
  • Existential encoding: ∃r.C = rotate(embed(C), f_r(embed(r)))
  • Best lambda: Centripetal weight λ optimized on eval data, saved in wrapper_config.json

Model Saving & Loading

Models are saved with:

  • Base sentence transformer weights
  • Role model weights (role_model.pt)
  • Configuration (wrapper_config.json) including best_lambda
  • Concept/role vocabularies
# Save
model.save("./my_model")

# Load (best_lambda automatically restored)
loaded = OntologyTransformer.from_pretrained("./my_model")

Running Tests

# Install with test dependencies
pip install -e ".[test]"

# Run all tests
pytest tests/ -v

# Skip integration tests (large ontologies)
pytest tests/ -v -m "not integration"

# Run specific test
pytest tests/test_pipeline.py::TestPipeline::test_fit_tiny_owl -v

Examples

See examples/ directory for:

  • Training on FoodOn, SNOMED CT, GALEN ontologies
  • Evaluating embeddings for subsumption prediction
  • Using external eval/test ontologies

Citation

If you use this package, please cite:

@inproceedings{yang2025language,
  title={Language Models as Ontology Encoder},
  author={Yang, Hui and Chen, Jiaoyan and Horrocks, Ian},
  booktitle={International Semantic Web Conference (ISWC)},
  year={2025},
  organization={Springer}
}

GitHub: https://github.com/HuiYang1997/OnT

Changelog

0.1.3 (2026-04-01)

  • Fix: axiom duplication in data preparationcreate_dataset() previously counted every axiom twice because getImportsClosure() already includes the ontology itself. The duplicated data inflated training set size 2–3× and degraded embedding quality.
  • Fix: OOM during evaluation on large ontologiesOnTEvaluator now scores candidates in GPU chunks (cand_chunk_size=4096, configurable) instead of broadcasting the full (batch, N, dim) tensor, eliminating OOM errors for ontologies with 100K+ concepts (e.g. SNOMED CT ~364K concepts).
  • Improvement: skip repeated data preparationpipeline.fit() reuses already-prepared data/ directory on restart, avoiding the 5-minute OWL parsing step when resuming crashed runs.

0.1.2

  • Initial public release.

License

Apache License 2.0 - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontology_transformer-0.1.4.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontology_transformer-0.1.4-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file ontology_transformer-0.1.4.tar.gz.

File metadata

  • Download URL: ontology_transformer-0.1.4.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for ontology_transformer-0.1.4.tar.gz
Algorithm Hash digest
SHA256 6eb9c2e9241da50981f9088561dbe057a34b53478ee84db1d0e8cc82c08b5405
MD5 946c21f6069fd4e0dbbef6e830dafff2
BLAKE2b-256 3acc82ede50cce4f6a4b3340b7f480922d3ec89bb1ad73e0fdb7abf6554c5cda

See more details on using hashes here.

File details

Details for the file ontology_transformer-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ontology_transformer-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 63345160edac7fef820ec0f6bf1bf1565abdb390be514e316fef5a15d5b22b61
MD5 e04082d2e842c0b56ed426abb0872d21
BLAKE2b-256 e2d21e61ac6692cbef4ed1bf9e0d2432fa84ff9cd610ce881b3907e42ed92a9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page