Skip to main content

A package for training and evaluating multimodal knowledge graph embeddings

Project description

PyKEEN

Travis CI License DOI PyPI version

PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information). It is part of the KEEN Universe.

InstallationQuickstartDatasetsModelsSupport

Installation

The development version of PyKEEN can be downloaded and installed from PyPI on Python 3.7+ with:

$ pip install pykeen

The development version of PyKEEN can be downloaded and installed from GitHub on Python 3.7+ with:

$ git clone https://github.com/pykeen/pykeeen.git pykeen
$ cd pykeen
$ pip install -e .
$ # Install pre-commit
$ pip install pre-commit
$ pre-commit install

PyKEEN has several extras for installation that are defined in the [options.extras_require] section of the setup.cfg. They can be included with installation using the bracket notation like in pip install pykeen[docs] or pip install -e .[docs]. Several can be listed, comma-delimited like in pip install pykeen[docs,plotting].

Name Description
plotting Plotting with seaborn and generation of word clouds
mlflow Tracking of results with mlflow
docs Building of the documentation
templating Building of templated documentation, like the README

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

Quickstart Documentation Status

This example shows how to train a model on a data set and test on another data set.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline
result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in a dataclass that has attributes for the trained model, the training loop, and the evaluation.

PyKEEN is extensible such that:

  • Each model has the same API, so anything from pykeen.models can be dropped in
  • Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
  • Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

Implementation

Below are the models, data sets, training modes, evaluators, and metrics implemented in pykeen.

Datasets (13)

Name Reference Description
fb15k pykeen.datasets.FB15k The FB15k data set.
fb15k237 pykeen.datasets.FB15k237 The FB15k-237 data set.
hetionet pykeen.datasets.Hetionet The Hetionet dataset is a large biological network.
kinships pykeen.datasets.Kinships The Kinships data set.
nations pykeen.datasets.Nations The Nations data set.
openbiolink pykeen.datasets.OpenBioLink The OpenBioLink dataset.
openbiolinkf1 pykeen.datasets.OpenBioLinkF1 The PyKEEN First Filtered OpenBioLink 2020 Dataset.
openbiolinkf2 pykeen.datasets.OpenBioLinkF2 The PyKEEN Second Filtered OpenBioLink 2020 Dataset.
openbiolinklq pykeen.datasets.OpenBioLinkLQ The low-quality variant of the OpenBioLink dataset.
umls pykeen.datasets.UMLS The UMLS data set.
wn18 pykeen.datasets.WN18 The WN18 data set.
wn18rr pykeen.datasets.WN18RR The WN18-RR data set.
yago310 pykeen.datasets.YAGO310 The YAGO3-10 data set is a subset of YAGO3 that only contains entities with at least 10 relations.

Models (23)

Name Reference Citation
ComplEx pykeen.models.ComplEx Trouillon et al., 2016
ComplExLiteral pykeen.models.ComplExLiteral Agustinus et al., 2018
ConvE pykeen.models.ConvE Dettmers et al., 2018
ConvKB pykeen.models.ConvKB Nguyen et al., 2018
DistMult pykeen.models.DistMult Yang et al., 2014
DistMultLiteral pykeen.models.DistMultLiteral Agustinus et al., 2018
ERMLP pykeen.models.ERMLP Dong et al., 2014
ERMLPE pykeen.models.ERMLPE Sharifzadeh et al., 2019
HolE pykeen.models.HolE Nickel et al., 2016
KG2E pykeen.models.KG2E He et al., 2015
NTN pykeen.models.NTN Socher et al., 2013
ProjE pykeen.models.ProjE Shi et al., 2017
RESCAL pykeen.models.RESCAL Nickel et al., 2011
RGCN pykeen.models.RGCN Schlichtkrull et al., 2018
RotatE pykeen.models.RotatE Sun et al., 2019
SimplE pykeen.models.SimplE Kazemi et al., 2018
StructuredEmbedding pykeen.models.StructuredEmbedding Bordes et al., 2011
TransD pykeen.models.TransD Ji et al., 2015
TransE pykeen.models.TransE Bordes et al., 2013
TransH pykeen.models.TransH Wang et al., 2014
TransR pykeen.models.TransR Lin et al., 2015
TuckER pykeen.models.TuckER Balazevic et al., 2019
UnstructuredModel pykeen.models.UnstructuredModel Bordes et al., 2014

Losses (7)

Name Reference Description
bce pykeen.losses.BCELoss A wrapper around the PyTorch binary cross entropy loss.
bceaftersigmoid pykeen.losses.BCEAfterSigmoidLoss A loss function which uses the numerically unstable version of explicit Sigmoid + BCE.
crossentropy pykeen.losses.CrossEntropyLoss Evaluate cross entropy after softmax output.
marginranking pykeen.losses.MarginRankingLoss A wrapper around the PyTorch margin ranking loss.
mse pykeen.losses.MSELoss A wrapper around the PyTorch mean square error loss.
nssa pykeen.losses.NSSALoss An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_.
softplus pykeen.losses.SoftplusLoss A loss function for the softplus.

Regularizers (5)

Name Reference Description
combined pykeen.regularizers.CombinedRegularizer A convex combination of regularizers.
lp pykeen.regularizers.LpRegularizer A simple L_p norm based regularizer.
no pykeen.regularizers.NoRegularizer A regularizer which does not perform any regularization.
powersum pykeen.regularizers.PowerSumRegularizer A simple x^p based regularizer.
transh pykeen.regularizers.TransHRegularizer A regularizer for the soft constraints in TransH.

Optimizers (6)

Name Reference Description
adadelta torch.optim.Adadelta Implements Adadelta algorithm.
adagrad torch.optim.Adagrad Implements Adagrad algorithm.
adam torch.optim.Adam Implements Adam algorithm.
adamax torch.optim.Adamax Implements Adamax algorithm (a variant of Adam based on infinity norm).
adamw torch.optim.AdamW Implements AdamW algorithm.
sgd torch.optim.SGD Implements stochastic gradient descent (optionally with momentum).

Training Loops (2)

Name Reference Description
lcwa pykeen.training.LCWATrainingLoop A training loop that uses the local closed world assumption training approach.
slcwa pykeen.training.SLCWATrainingLoop A training loop that uses the stochastic local closed world assumption training approach.

Negative Samplers (2)

Name Reference Description
basic pykeen.sampling.BasicNegativeSampler A basic negative sampler.
bernoulli pykeen.sampling.BernoulliNegativeSampler An implementation of the bernoulli negative sampling approach proposed by [wang2014]_.

Stoppers (2)

Name Reference Description
early pykeen.stoppers.EarlyStopper A harness for early stopping.
nop pykeen.stoppers.NopStopper A stopper that does nothing.

Evaluators (2)

Name Reference Description
rankbased pykeen.evaluation.RankBasedEvaluator A rank-based evaluator for KGE models.
sklearn pykeen.evaluation.SklearnEvaluator An evaluator that uses a Scikit-learn metric.

Metrics (6)

Metric Description Evaluator Reference
Adjusted Mean Rank The mean over all chance-adjusted ranks: mean_i (2r_i / (num_entities+1)). Lower is better. rankbased pykeen.evaluation.RankBasedMetricResults
Average Precision Score The area under the precision-recall curve, between [0.0, 1.0]. Higher is better. sklearn pykeen.evaluation.SklearnMetricResults
Hits At K The hits at k for different values of k, i.e. the relative frequency of ranks not larger than k. Higher is better. rankbased pykeen.evaluation.RankBasedMetricResults
Mean Rank The mean over all ranks: mean_i r_i. Lower is better. rankbased pykeen.evaluation.RankBasedMetricResults
Mean Reciprocal Rank The mean over all reciprocal ranks: mean_i (1/r_i). Higher is better. rankbased pykeen.evaluation.RankBasedMetricResults
Roc Auc Score The area under the ROC curve between [0.0, 1.0]. Higher is better. sklearn pykeen.evaluation.SklearnMetricResults

Hyper-parameter Optimization

Samplers (2)

Name Reference Description
random optuna.samplers.RandomSampler Sampler using random sampling.
tpe optuna.samplers.TPESampler Sampler using TPE (Tree-structured Parzen Estimator) algorithm.

Experimentation

Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:

pykeen experiments reproduce tucker balazevic2019 fb15k

Where the three arguments are the model name, the reference, and the data set. The output directory can be optionally set with -d.

Ablation

PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:

pykeen experiments ablation ~/path/to/config.json

Acknowledgements

Supporters

This project has been supported by several organizations (in alphabetical order):

Logo

The PyKEEN logo was designed by Carina Steinborn.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pykeen-1.0.2.tar.gz (692.3 kB view details)

Uploaded Source

Built Distribution

pykeen-1.0.2-py3-none-any.whl (306.0 kB view details)

Uploaded Python 3

File details

Details for the file pykeen-1.0.2.tar.gz.

File metadata

  • Download URL: pykeen-1.0.2.tar.gz
  • Upload date:
  • Size: 692.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pykeen-1.0.2.tar.gz
Algorithm Hash digest
SHA256 b6ecf983a3598d8628d969cc47f104c7030bb593a646683d2230d2d8c8f67d87
MD5 8cc82d66b077b6702018629031d9fbb0
BLAKE2b-256 c0ceb8d5a104167e67d49daf556a153bb582924bd564135de309d4ae919a953f

See more details on using hashes here.

File details

Details for the file pykeen-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pykeen-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 306.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pykeen-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 129542b9fa90db747f7176bb2f4a21899429dd1ec19c2343d19a68230512328e
MD5 111d8b5bf279447c32f03a2005c9f396
BLAKE2b-256 6548390602d657e92ec2b018d8000718b68bf766606ef8e0d992ad8c6cc4fe3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page