A package for training and evaluating multimodal knowledge graph embeddings
Project description
PyKEEN
PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).
Installation • Quickstart • Datasets • Inductive Datasets (5) • Models • Support • Citation
Installation
The latest stable version of PyKEEN can be downloaded and installed from PyPI with:
$ pip install pykeen
The latest version of PyKEEN can be installed directly from the source on GitHub with:
$ pip install git+https://github.com/pykeen/pykeen.git
More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.
Quickstart
This example shows how to train a model on a dataset and test on another dataset.
The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.
from pykeen.pipeline import pipeline
result = pipeline(
model='TransE',
dataset='nations',
)
The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.
PyKEEN is extensible such that:
- Each model has the same API, so anything from
pykeen.models
can be dropped in - Each training loop has the same API, so
pykeen.training.LCWATrainingLoop
can be dropped in - Triples factories can be generated by the user with
from pykeen.triples.TriplesFactory
The full documentation can be found at https://pykeen.readthedocs.io.
Implementation
Below are the models, datasets, training modes, evaluators, and metrics implemented
in pykeen
.
Datasets (34)
The following datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.
Inductive Datasets
The following inductive datasets are built in to PyKEEN.
Name | Documentation | Citation |
---|---|---|
ILPC2022 Large | pykeen.datasets.ILPC2022Large |
Galkin et al., 2022 |
ILPC2022 Small | pykeen.datasets.ILPC2022Small |
Galkin et al., 2022 |
FB15k-237 | pykeen.datasets.InductiveFB15k237 |
Teru et al., 2020 |
NELL | pykeen.datasets.InductiveNELL |
Teru et al., 2020 |
WordNet-18 (RR) | pykeen.datasets.InductiveWN18RR |
Teru et al., 2020 |
Models (42)
Losses (13)
Name | Reference | Description |
---|---|---|
Binary cross entropy (after sigmoid) | pykeen.losses.BCEAfterSigmoidLoss |
A module for the numerically unstable version of explicit Sigmoid + BCE loss. |
Binary cross entropy (with logits) | pykeen.losses.BCEWithLogitsLoss |
A module for the binary cross entropy loss. |
Cross entropy | pykeen.losses.CrossEntropyLoss |
A module for the cross entropy loss that evaluates the cross entropy after softmax output. |
Double Margin | pykeen.losses.DoubleMarginLoss |
A limit-based scoring loss, with separate margins for positive and negative elements from [sun2018]_. |
Focal | pykeen.losses.FocalLoss |
A module for the focal loss proposed by [lin2018]_. |
Margin ranking | pykeen.losses.MarginRankingLoss |
A module for the pairwise hinge loss (i.e., margin ranking loss). |
Mean square error | pykeen.losses.MSELoss |
A module for the mean square error loss. |
Self-adversarial negative sampling | pykeen.losses.NSSALoss |
An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_. |
Pairwise logistic | pykeen.losses.PairwiseLogisticLoss |
The pairwise logistic loss. |
Pointwise Hinge | pykeen.losses.PointwiseHingeLoss |
A module for the pointwise hinge loss. |
Soft margin ranking | pykeen.losses.SoftMarginRankingLoss |
A module for the soft pairwise hinge loss (i.e., soft margin ranking loss). |
Softplus | pykeen.losses.SoftplusLoss |
A module for the pointwise logistic loss (i.e., softplus loss). |
Soft Pointwise Hinge | pykeen.losses.SoftPointwiseHingeLoss |
A module for the soft pointwise hinge loss . |
Regularizers (5)
Name | Reference | Description |
---|---|---|
combined | pykeen.regularizers.CombinedRegularizer |
A convex combination of regularizers. |
lp | pykeen.regularizers.LpRegularizer |
A simple L_p norm based regularizer. |
no | pykeen.regularizers.NoRegularizer |
A regularizer which does not perform any regularization. |
powersum | pykeen.regularizers.PowerSumRegularizer |
A simple x^p based regularizer. |
transh | pykeen.regularizers.TransHRegularizer |
A regularizer for the soft constraints in TransH. |
Training Loops (2)
Name | Reference | Description |
---|---|---|
lcwa | pykeen.training.LCWATrainingLoop |
A training loop that is based upon the local closed world assumption (LCWA). |
slcwa | pykeen.training.SLCWATrainingLoop |
A training loop that uses the stochastic local closed world assumption training approach. |
Negative Samplers (3)
Name | Reference | Description |
---|---|---|
basic | pykeen.sampling.BasicNegativeSampler |
A basic negative sampler. |
bernoulli | pykeen.sampling.BernoulliNegativeSampler |
An implementation of the Bernoulli negative sampling approach proposed by [wang2014]_. |
pseudotyped | pykeen.sampling.PseudoTypedNegativeSampler |
A sampler that accounts for which entities co-occur with a relation. |
Stoppers (2)
Name | Reference | Description |
---|---|---|
early | pykeen.stoppers.EarlyStopper |
A harness for early stopping. |
nop | pykeen.stoppers.NopStopper |
A stopper that does nothing. |
Evaluators (4)
Name | Reference | Description |
---|---|---|
classification | pykeen.evaluation.ClassificationEvaluator |
An evaluator that uses a classification metrics. |
macrorankbased | pykeen.evaluation.MacroRankBasedEvaluator |
Macro-average rank-based evaluation. |
rankbased | pykeen.evaluation.RankBasedEvaluator |
A rank-based evaluator for KGE models. |
sampledrankbased | pykeen.evaluation.SampledRankBasedEvaluator |
A rank-based evaluator using sampled negatives instead of all negatives. |
Metrics (44)
Name | Interval | Direction | Description | Type |
---|---|---|---|---|
AUC-ROC | [0, 1] | 📈 | Area Under the ROC Curve | Classification |
Accuracy | [0, 1] | 📈 | (TP + TN) / (TP + TN + FP + FN) | Classification |
Average Precision | [0, 1] | 📈 | A summary statistic over the precision-recall curve | Classification |
Balanced Accuracy | [0, 1] | 📈 | An adjusted version of the accuracy for imbalanced datasets | Classification |
Diagnostic Odds Ratio | [0, ∞) | 📈 | LR+/LR- | Classification |
F1 Score | [0, 1] | 📈 | 2TP / (2TP + FP + FN) | Classification |
False Discovery Rate | [0, 1] | 📉 | FP / (FP + TP) | Classification |
False Negative Rate | [0, 1] | 📉 | FN / (FN + TP) | Classification |
False Omission Rate | [0, 1] | 📉 | FN / (FN + TN) | Classification |
False Positive Rate | [0, 1] | 📉 | FP / (FP + TN) | Classification |
Fowlkes Mallows Index | [0, 1] | 📈 | √PPV x √TPR | Classification |
Informedness | [0, 1] | 📈 | TPR + TNR - 1 | Classification |
Markedness | [0, 1] | 📈 | PPV + NPV - 1 | Classification |
Matthews Correlation Coefficient | [-1, 1] | 📈 | A balanced measure applicable even with class imbalance | Classification |
Negative Likelihood Ratio | [0, ∞) | 📉 | FNR / TNR | Classification |
Negative Predictive Value | [0, 1] | 📈 | TN / (TN + FN) | Classification |
Positive Likelihood Ratio | [0, ∞) | 📈 | TPR / FPR | Classification |
Positive Predictive Value | [0, 1] | 📈 | TP / (TP + FP) | Classification |
Prevalence Threshold | [0, 1] | 📉 | √FPR / (√TPR + √FPR) | Classification |
Threat Score | [0, 1] | 📈 | TP / (TP + FN + FP) | Classification |
True Negative Rate | [0, 1] | 📈 | TN / (TN + FP) | Classification |
True Positive Rate | [0, 1] | 📈 | TP / (TP + FN) | Classification |
Adjusted Arithmetic Mean Rank (AAMR) | [0, 2) | 📉 | The mean over all ranks divided by its expected value. | Ranking |
Adjusted Arithmetic Mean Rank Index (AAMRI) | [-1, 1] | 📈 | The re-indexed adjusted mean rank (AAMR) | Ranking |
Adjusted Geometric Mean Rank Index (AGMRI) | (-E[f]/(1-E[f]), 1] | 📈 | The re-indexed adjusted geometric mean rank (AGMRI) | Ranking |
Adjusted Hits at K | (-E[f]/(1-E[f]), 1] | 📈 | The re-indexed adjusted hits at K | Ranking |
Adjusted Inverse Harmonic Mean Rank | (-E[f]/(1-E[f]), 1] | 📈 | The re-indexed adjusted MRR | Ranking |
Geometric Mean Rank (GMR) | [1, ∞) | 📉 | The geometric mean over all ranks. | Ranking |
Harmonic Mean Rank (HMR) | [1, ∞) | 📉 | The harmonic mean over all ranks. | Ranking |
Hits @ K | [0, 1] | 📈 | The relative frequency of ranks not larger than a given k. | Ranking |
Inverse Arithmetic Mean Rank (IAMR) | (0, 1] | 📈 | The inverse of the arithmetic mean over all ranks. | Ranking |
Inverse Geometric Mean Rank (IGMR) | (0, 1] | 📈 | The inverse of the geometric mean over all ranks. | Ranking |
Inverse Median Rank | (0, 1] | 📈 | The inverse of the median over all ranks. | Ranking |
Mean Rank (MR) | [1, ∞) | 📉 | The arithmetic mean over all ranks. | Ranking |
Mean Reciprocal Rank (MRR) | (0, 1] | 📈 | The inverse of the harmonic mean over all ranks. | Ranking |
Median Rank | [1, ∞) | 📉 | The median over all ranks. | Ranking |
z-Geometric Mean Rank (zGMR) | (-∞, ∞) | 📈 | The z-scored geometric mean rank | Ranking |
z-Hits at K | (-∞, ∞) | 📈 | The z-scored hits at K | Ranking |
z-Mean Rank (zMR) | (-∞, ∞) | 📈 | The z-scored mean rank | Ranking |
z-Mean Reciprocal Rank (zMRR) | (-∞, ∞) | 📈 | The z-scored mean reciprocal rank | Ranking |
Trackers (8)
Name | Reference | Description |
---|---|---|
console | pykeen.trackers.ConsoleResultTracker |
A class that directly prints to console. |
csv | pykeen.trackers.CSVResultTracker |
Tracking results to a CSV file. |
json | pykeen.trackers.JSONResultTracker |
Tracking results to a JSON lines file. |
mlflow | pykeen.trackers.MLFlowResultTracker |
A tracker for MLflow. |
neptune | pykeen.trackers.NeptuneResultTracker |
A tracker for Neptune.ai. |
python | pykeen.trackers.PythonResultTracker |
A tracker which stores everything in Python dictionaries. |
tensorboard | pykeen.trackers.TensorBoardResultTracker |
A tracker for TensorBoard. |
wandb | pykeen.trackers.WANDBResultTracker |
A tracker for Weights and Biases. |
Experimentation
Reproduction
PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:
$ pykeen experiments reproduce tucker balazevic2019 fb15k
Where the three arguments are the model name, the reference, and the dataset.
The output directory can be optionally set with -d
.
Ablation
PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:
$ pykeen experiments ablation ~/path/to/config.json
Large-scale Reproducibility and Benchmarking Study
We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:
@article{ali2020benchmarking,
author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework},
year={2021},
pages={1-1},
doi={10.1109/TPAMI.2021.3124805}}
}
We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.
Contributing
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
Acknowledgements
Supporters
This project has been supported by several organizations (in alphabetical order):
- Bayer
- CoronaWhy
- Enveda Biosciences
- Fraunhofer Institute for Algorithms and Scientific Computing
- Fraunhofer Institute for Intelligent Analysis and Information Systems
- Fraunhofer Center for Machine Learning
- Harvard Program in Therapeutic Science - Laboratory of Systems Pharmacology
- Ludwig-Maximilians-Universität München
- Munich Center for Machine Learning (MCML)
- Siemens
- Smart Data Analytics Research Group (University of Bonn & Fraunhofer IAIS)
- Technical University of Denmark - DTU Compute - Section for Cognitive Systems
- Technical University of Denmark - DTU Compute - Section for Statistics and Data Analysis
- University of Bonn
Funding
The development of PyKEEN has been funded by the following grants:
Funding Body | Program | Grant |
---|---|---|
DARPA | Young Faculty Award (PI: Benjamin Gyori) | W911NF2010255 |
DARPA | Automating Scientific Knowledge Extraction (ASKE) | HR00111990009 |
German Federal Ministry of Education and Research (BMBF) | Maschinelles Lernen mit Wissensgraphen (MLWin) | 01IS18050D |
German Federal Ministry of Education and Research (BMBF) | Munich Center for Machine Learning (MCML) | 01IS18036A |
Innovation Fund Denmark (Innovationsfonden) | Danish Center for Big Data Analytics driven Innovation (DABAI) | Grand Solutions |
Logo
The PyKEEN logo was designed by Carina Steinborn
Citation
If you have found PyKEEN useful in your work, please consider citing our article:
@article{ali2021pykeen,
author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
journal = {Journal of Machine Learning Research},
number = {82},
pages = {1--6},
title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}},
url = {http://jmlr.org/papers/v22/20-825.html},
volume = {22},
year = {2021}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pykeen-1.8.1.tar.gz
.
File metadata
- Download URL: pykeen-1.8.1.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49efbd7884ebf3ef130f0a88814ee07e5726b84c72e74d66efe17b76d20c2c72 |
|
MD5 | c74fb953282f40e9da7a2e8d19a1f09f |
|
BLAKE2b-256 | 2ebe901a43d31679e8e6a275b1d16a7d9c60b93ed1cfe989d810a68eadbce505 |
File details
Details for the file pykeen-1.8.1-py3-none-any.whl
.
File metadata
- Download URL: pykeen-1.8.1-py3-none-any.whl
- Upload date:
- Size: 630.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b51068c335f4ece8d869a57de7d1da0d7fa04b657f971fb0149a01a4f43991db |
|
MD5 | 9b7043e5cbc0fdc422d453ecd336251a |
|
BLAKE2b-256 | cf82034ea446436db897232adc7496bc0a1321ecc12ba9b281b13e0959710d72 |