Language models as hierarchy encoders.
Project description
Hierarchy Transformers (HiTs)
Code repository for the paper: "Language Models as Hierarchy Encoders".
News :newspaper:
- We will update detailed documentation of this work in DeepOnto.
- Deploy initial release. (v0.0.1)
Installation
Main Dependencies
This repository follows a similar layout as the Sentence Transformers library. It mainly depends on the following libraries:
-
Sentence Transformers for language models.
-
DeepOnto for processing hierarchies and constructing datasets from hierarchies.
-
Geoopt for arithmetic in hyperbolic space.
Install from PyPI
# requiring Python>=3.8
pip install hierarchy_transformers
Install from GitHub
pip install git+https://github.com/KRR-Oxford/HierarchyTransformers.git
Models on Huggingface Hub
Our HiT models are released on the Huggingface Hub.
Get Started
Use the following code to get started with HiTs:
from hierarchy_transformers import HierarchyTransformer
from hierarchy_transformers.utils import get_torch_device
# set up the device (use cpu if no gpu found)
gpu_id = 0
device = get_torch_device(gpu_id)
# load the model
model = HierarchyTransformer.load_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNet', device)
# entity names to be encoded.
entity_names = ["computer", "personal computer", "fruit", "berry"]
# get the entity embeddings
entity_embeddings = model.encode(entity_names)
Default Probing for Subsumption Prediction
Use the entity embeddings to predict the subsumption relationships between them.
# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)
# compute the hyperbolic distances and norms of entity embeddings
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
child_norms = model.manifold.dist0(child_entity_embeddings)
parent_norms = model.manifold.dist0(parent_entity_embeddings)
# use the empirical function for subsumption prediction proposed in the paper
# `centri_score_weight` and the overall threshold are determined on the validation set
# see source code at `src/hierarchy_transformers/evaluation` for more details about our implementation for the hyperparameter tuning.
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))
Datasets
Datasets for training and evaluating HiTs are available at Zenodo, including those constructed from:
- WordNet
- SNOMED CT
- Schema.org
- FoodOn
- DOID
License
!!! license "License"
Copyright 2023 Yuan He.
All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at *<http://www.apache.org/licenses/LICENSE-2.0>*
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Citation
The preprint of our paper for is currently available at arxiv.
Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks. Language Models as Hierarchy Encoders. arXiv preprint arXiv:2401.11374 (2024).
@article{he2024language,
title={Language Models as Hierarchy Encoders},
author={He, Yuan and Yuan, Zhangdie and Chen, Jiaoyan and Horrocks, Ian},
journal={arXiv preprint arXiv:2401.11374},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hierarchy_transformers-0.0.3.tar.gz
.
File metadata
- Download URL: hierarchy_transformers-0.0.3.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec6ee646f37f94b53c467e61c6f99e082f83bda5b7bd89ed50b82659b20a91e9 |
|
MD5 | 1531637d6c15fc4c14e968ad92d43b0c |
|
BLAKE2b-256 | 4a48dd92ef23a5c0b6d37745af2fd421e04eb8c61968c5281eb8bcae0704d75c |
File details
Details for the file hierarchy_transformers-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: hierarchy_transformers-0.0.3-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ec9b7f687a53377095050814a0cffdb65664e976ab8a6b7d4a7463169beb93f |
|
MD5 | 8af31ea1421f31a943aaf73477240e1e |
|
BLAKE2b-256 | ba850f4bef23dd005b10be57a35723d7e6eb790780f919053391962f130eaaf4 |