Skip to main content

Language models as hierarchy encoders.

Project description

Hierarchy Transformers (HiTs)

Code repository for the paper: "Language Models as Hierarchy Encoders".

News :newspaper:

  • We will update detailed documentation of this work in DeepOnto.
  • Deploy initial release. (v0.0.1)

Installation

Main Dependencies

This repository follows a similar layout as the Sentence Transformers library. It mainly depends on the following libraries:

Install from PyPI

# requiring Python>=3.8
pip install hierarchy_transformers

Install from GitHub

pip install git+https://github.com/KRR-Oxford/HierarchyTransformers.git

Models on Huggingface Hub

Our HiT models are released on the Huggingface Hub.

Get Started

Use the following code to get started with HiTs:

from hierarchy_transformers import HierarchyTransformer
from hierarchy_transformers.utils import get_torch_device

# set up the device (use cpu if no gpu found)
gpu_id = 0
device = get_torch_device(gpu_id)

# load the model
model = HierarchyTransformer.load_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNet', device)

# entity names to be encoded.
entity_names = ["computer", "personal computer", "fruit", "berry"]

# get the entity embeddings
entity_embeddings = model.encode(entity_names)

Default Probing for Subsumption Prediction

Use the entity embeddings to predict the subsumption relationships between them.

# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)

# compute the hyperbolic distances and norms of entity embeddings
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
child_norms = model.manifold.dist0(child_entity_embeddings)
parent_norms = model.manifold.dist0(parent_entity_embeddings)

# use the empirical function for subsumption prediction proposed in the paper
# `centri_score_weight` and the overall threshold are determined on the validation set
# see source code at `src/hierarchy_transformers/evaluation` for more details about our implementation for the hyperparameter tuning.
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))

Datasets

Datasets for training and evaluating HiTs are available at Zenodo, including those constructed from:

  • WordNet
  • SNOMED CT
  • Schema.org
  • FoodOn
  • DOID

License

!!! license "License"

Copyright 2023 Yuan He.
All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at *<http://www.apache.org/licenses/LICENSE-2.0>*

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Citation

The preprint of our paper for is currently available at arxiv.

Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks. Language Models as Hierarchy Encoders. arXiv preprint arXiv:2401.11374 (2024).

@article{he2024language,
  title={Language Models as Hierarchy Encoders},
  author={He, Yuan and Yuan, Zhangdie and Chen, Jiaoyan and Horrocks, Ian},
  journal={arXiv preprint arXiv:2401.11374},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hierarchy_transformers-0.0.3.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

hierarchy_transformers-0.0.3-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file hierarchy_transformers-0.0.3.tar.gz.

File metadata

File hashes

Hashes for hierarchy_transformers-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ec6ee646f37f94b53c467e61c6f99e082f83bda5b7bd89ed50b82659b20a91e9
MD5 1531637d6c15fc4c14e968ad92d43b0c
BLAKE2b-256 4a48dd92ef23a5c0b6d37745af2fd421e04eb8c61968c5281eb8bcae0704d75c

See more details on using hashes here.

File details

Details for the file hierarchy_transformers-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for hierarchy_transformers-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2ec9b7f687a53377095050814a0cffdb65664e976ab8a6b7d4a7463169beb93f
MD5 8af31ea1421f31a943aaf73477240e1e
BLAKE2b-256 ba850f4bef23dd005b10be57a35723d7e6eb790780f919053391962f130eaaf4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page