Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!

A package for training and evaluating multimodal knowledge graph embeddings

Project description

PyKEEN

GitHub Actions License DOI Optuna integrated

PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).

InstallationQuickstartDatasetsModelsSupportCitation

Installation PyPI - Python Version PyPI

The latest stable version of PyKEEN can be downloaded and installed from PyPI with:

$ pip install pykeen

The latest version of PyKEEN can be installed directly from the source on GitHub with:

pip install git+https://github.com/pykeen/pykeen.git

More information about installation (e.g., development mode, Windows installation, extras) can be found in the installation documentation.

Quickstart Documentation Status

This example shows how to train a model on a dataset and test on another dataset.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on understanding the evaluation and making novel link predictions.

PyKEEN is extensible such that:

  • Each model has the same API, so anything from pykeen.models can be dropped in
  • Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
  • Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

The full documentation can be found at https://pykeen.readthedocs.io.

Implementation

Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen.

Datasets (25)

The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available.

Name Documentation Citation Entities Relations Triples
Clinical Knowledge Graph pykeen.datasets.CKG Santos et al., 2020 7617419 11 26691525
CoDEx (large) pykeen.datasets.CoDExLarge Safavi et al., 2020 77951 69 612437
CoDEx (medium) pykeen.datasets.CoDExMedium Safavi et al., 2020 17050 51 206205
CoDEx (small) pykeen.datasets.CoDExSmall Safavi et al., 2020 2034 42 36543
ConceptNet pykeen.datasets.ConceptNet Speer et al., 2017 28370083 50 34074917
Countries pykeen.datasets.Countries ZhenfengLei/KGDatasets 271 2 1158
Commonsense Knowledge Graph pykeen.datasets.CSKG Ilievski et al., 2020 2087833 58 4598728
DB100K pykeen.datasets.DB100K Ding et al., 2018 99604 470 697479
DBpedia50 pykeen.datasets.DBpedia50 Shi et al., 2017 24624 351 34421
Drug Repositioning Knowledge Graph pykeen.datasets.DRKG gnn4dr/DRKG 97238 107 5874257
FB15k pykeen.datasets.FB15k Bordes et al., 2013 14951 1345 592213
FB15k-237 pykeen.datasets.FB15k237 Toutanova et al., 2015 14505 237 310079
Hetionet pykeen.datasets.Hetionet Himmelstein et al., 2017 45158 24 2250197
Kinships pykeen.datasets.Kinships ZhenfengLei/KGDatasets 104 25 10686
Nations pykeen.datasets.Nations ZhenfengLei/KGDatasets 14 55 1992
OGB BioKG pykeen.datasets.OGBBioKG Hu et al., 2020 45085 51 5088433
OGB WikiKG pykeen.datasets.OGBWikiKG Hu et al., 2020 2500604 535 17137181
OpenBioLink pykeen.datasets.OpenBioLink Breit et al., 2020 180992 28 4563407
OpenBioLink (F1) pykeen.datasets.OpenBioLinkF1 PyKEEN/pykeen-openbiolink-benchmark 116425 19 1716703
OpenBioLink (F2) pykeen.datasets.OpenBioLinkF2 PyKEEN/pykeen-openbiolink-benchmark 110628 17 734925
OpenBioLink pykeen.datasets.OpenBioLinkLQ Breit et al., 2020 480876 32 27320889
Unified Medical Language System pykeen.datasets.UMLS ZhenfengLei/KGDatasets 135 46 6529
WordNet-18 pykeen.datasets.WN18 Bordes et al., 2014 40943 18 151442
WordNet-18 (RR) pykeen.datasets.WN18RR Toutanova et al., 2015 40559 11 92583
YAGO3-10 pykeen.datasets.YAGO310 Mahdisoltani et al., 2015 123143 37 1089000

Models (25)

Name Reference Citation
ComplEx pykeen.models.ComplEx Trouillon et al., 2016
ComplExLiteral pykeen.models.ComplExLiteral Kristiadi et al., 2018
ConvE pykeen.models.ConvE Dettmers et al., 2018
ConvKB pykeen.models.ConvKB Nguyen et al., 2018
DistMult pykeen.models.DistMult Yang et al., 2014
DistMultLiteral pykeen.models.DistMultLiteral Kristiadi et al., 2018
ERMLP pykeen.models.ERMLP Dong et al., 2014
ERMLPE pykeen.models.ERMLPE Sharifzadeh et al., 2019
HolE pykeen.models.HolE Nickel et al., 2016
KG2E pykeen.models.KG2E He et al., 2015
MuRE pykeen.models.MuRE Balažević et al., 2019
NTN pykeen.models.NTN Socher et al., 2013
PairRE pykeen.models.PairRE Chao et al., 2020
ProjE pykeen.models.ProjE Shi et al., 2017
RESCAL pykeen.models.RESCAL Nickel et al., 2011
RGCN pykeen.models.RGCN Schlichtkrull et al., 2018
RotatE pykeen.models.RotatE Sun et al., 2019
SimplE pykeen.models.SimplE Kazemi et al., 2018
StructuredEmbedding pykeen.models.StructuredEmbedding Bordes et al., 2011
TransD pykeen.models.TransD Ji et al., 2015
TransE pykeen.models.TransE Bordes et al., 2013
TransH pykeen.models.TransH Wang et al., 2014
TransR pykeen.models.TransR Lin et al., 2015
TuckER pykeen.models.TuckER Balažević et al., 2019
UnstructuredModel pykeen.models.UnstructuredModel Bordes et al., 2014

Losses (7)

Name Reference Description
bceaftersigmoid pykeen.losses.BCEAfterSigmoidLoss A module for the numerically unstable version of explicit Sigmoid + BCE loss.
bcewithlogits pykeen.losses.BCEWithLogitsLoss A module for the binary cross entropy loss.
crossentropy pykeen.losses.CrossEntropyLoss A module for the cross entopy loss that evaluates the cross entropy after softmax output.
marginranking pykeen.losses.MarginRankingLoss A module for the margin ranking loss.
mse pykeen.losses.MSELoss A module for the mean square error loss.
nssa pykeen.losses.NSSALoss An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_.
softplus pykeen.losses.SoftplusLoss A module for the softplus loss.

Regularizers (5)

Name Reference Description
combined pykeen.regularizers.CombinedRegularizer A convex combination of regularizers.
lp pykeen.regularizers.LpRegularizer A simple L_p norm based regularizer.
no pykeen.regularizers.NoRegularizer A regularizer which does not perform any regularization.
powersum pykeen.regularizers.PowerSumRegularizer A simple x^p based regularizer.
transh pykeen.regularizers.TransHRegularizer A regularizer for the soft constraints in TransH.

Optimizers (6)

Name Reference Description
adadelta torch.optim.Adadelta Implements Adadelta algorithm.
adagrad torch.optim.Adagrad Implements Adagrad algorithm.
adam torch.optim.Adam Implements Adam algorithm.
adamax torch.optim.Adamax Implements Adamax algorithm (a variant of Adam based on infinity norm).
adamw torch.optim.AdamW Implements AdamW algorithm.
sgd torch.optim.SGD Implements stochastic gradient descent (optionally with momentum).

Training Loops (2)

Name Reference Description
lcwa pykeen.training.LCWATrainingLoop A training loop that uses the local closed world assumption training approach.
slcwa pykeen.training.SLCWATrainingLoop A training loop that uses the stochastic local closed world assumption training approach.

Negative Samplers (2)

Name Reference Description
basic pykeen.sampling.BasicNegativeSampler A basic negative sampler.
bernoulli pykeen.sampling.BernoulliNegativeSampler An implementation of the Bernoulli negative sampling approach proposed by [wang2014]_.

Stoppers (2)

Name Reference Description
early pykeen.stoppers.EarlyStopper A harness for early stopping.
nop pykeen.stoppers.NopStopper A stopper that does nothing.

Evaluators (2)

Name Reference Description
rankbased pykeen.evaluation.RankBasedEvaluator A rank-based evaluator for KGE models.
sklearn pykeen.evaluation.SklearnEvaluator An evaluator that uses a Scikit-learn metric.

Metrics (6)

Metric Description Evaluator Reference
Adjusted Mean Rank The mean over all chance-adjusted ranks: mean_i (2r_i / (num_entities+1)). Lower is better. rankbased pykeen.evaluation.RankBasedMetricResults
Average Precision Score The area under the precision-recall curve, between [0.0, 1.0]. Higher is better. sklearn pykeen.evaluation.SklearnMetricResults
Hits At K The hits at k for different values of k, i.e. the relative frequency of ranks not larger than k. Higher is better. rankbased pykeen.evaluation.RankBasedMetricResults
Mean Rank The mean over all ranks: mean_i r_i. Lower is better. rankbased pykeen.evaluation.RankBasedMetricResults
Mean Reciprocal Rank The mean over all reciprocal ranks: mean_i (1/r_i). Higher is better. rankbased pykeen.evaluation.RankBasedMetricResults
Roc Auc Score The area under the ROC curve between [0.0, 1.0]. Higher is better. sklearn pykeen.evaluation.SklearnMetricResults

Trackers (5)

Name Reference Description
csv pykeen.trackers.CSVResultTracker Tracking results to a CSV file.
json pykeen.trackers.JSONResultTracker Tracking results to a JSON lines file.
mlflow pykeen.trackers.MLFlowResultTracker A tracker for MLflow.
neptune pykeen.trackers.NeptuneResultTracker A tracker for Neptune.ai.
wandb pykeen.trackers.WANDBResultTracker A tracker for Weights and Biases.

Hyper-parameter Optimization

Samplers (3)

Name Reference Description
grid optuna.samplers.GridSampler Sampler using grid search.
random optuna.samplers.RandomSampler Sampler using random sampling.
tpe optuna.samplers.TPESampler Sampler using TPE (Tree-structured Parzen Estimator) algorithm.

Any sampler class extending the optuna.samplers.BaseSampler, such as their sampler implementing the CMA-ES algorithm, can also be used.

Experimentation

Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:

pykeen experiments reproduce tucker balazevic2019 fb15k

Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d.

Ablation

PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:

pykeen experiments ablation ~/path/to/config.json

Large-scale Reproducibility and Benchmarking Study

We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:

@article{ali2020benchmarking,
  title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2006.13365},
  year={2020}
}

We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

Acknowledgements

Supporters

This project has been supported by several organizations (in alphabetical order):

Logo

The PyKEEN logo was designed by Carina Steinborn.

Citation

If you have found PyKEEN useful in your work, please consider citing our article:

@article{ali2020pykeen,
  title={PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Emebddings},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2007.14175},
  year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pykeen, version 1.4.0
Filename, size File type Python version Upload date Hashes
Filename, size pykeen-1.4.0-py3-none-any.whl (425.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size pykeen-1.4.0.tar.gz (1.4 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page