Representation¶
Embedding modules.
- class Embedding(num_embeddings, embedding_dim=None, shape=None, initializer=None, initializer_kwargs=None, normalizer=None, normalizer_kwargs=None, constrainer=None, constrainer_kwargs=None, regularizer=None, regularizer_kwargs=None, trainable=True, dtype=None, dropout=None)[source]¶
Trainable embeddings.
This class provides the same interface as
torch.nn.Embedding
and can be used throughout PyKEEN as a more fully featured drop-in replacement.It extends it by adding additional options for normalizing, constraining, or applying dropout.
When a normalizer is selected, it is applied in every forward pass. It can be used, e.g., to ensure that the embedding vectors are of unit length. A constrainer can be used similarly, but it is applied after each parameter update (using the post_parameter_update hook), i.e., outside of the automatic gradient computation.
The optional dropout can also be used as a regularization technique. Moreover, it enables to obtain uncertainty estimates via techniques such as Monte-Carlo dropout. The following simple example shows how to obtain different scores for a single triple from an (untrained) model. These scores can be considered as samples from a distribution over the scores.
>>> from pykeen.datasets import Nations >>> dataset = Nations() >>> from pykeen.nn.emb import EmbeddingSpecification >>> spec = EmbeddingSpecification(embedding_dim=3, dropout=0.1) >>> from pykeen.models import ERModel >>> model = ERModel( ... triples_factory=dataset.training, ... interaction='distmult', ... entity_representations=spec, ... relation_representations=spec, ... ) >>> import torch >>> batch = torch.as_tensor(data=[[0, 1, 0]]).repeat(10, 1) >>> scores = model.score_hrt(batch)
Instantiate an embedding with extended functionality.
- Parameters
num_embeddings (
int
) – >0 The number of embeddings.embedding_dim (
Optional
[int
]) – >0 The embedding dimensionality.initializer (
Union
[str
,Callable
[[FloatTensor
],FloatTensor
],None
]) –An optional initializer, which takes an uninitialized (num_embeddings, embedding_dim) tensor as input, and returns an initialized tensor of same shape and dtype (which may be the same, i.e. the initialization may be in-place). Can be passed as a function, or as string corresponding to a key in
pykeen.nn.emb.initializers
such as:"xavier_uniform"
"xavier_uniform_norm"
"xavier_normal"
"xavier_normal_norm"
"normal"
"normal_norm"
"uniform"
"uniform_norm"
"init_phases"
initializer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed to the initializernormalizer (
Union
[str
,Callable
[[FloatTensor
],FloatTensor
],None
]) – A normalization function, which is applied in every forward pass.normalizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed to the normalizerconstrainer (
Union
[str
,Callable
[[FloatTensor
],FloatTensor
],None
]) –A function which is applied to the weights after each parameter update, without tracking gradients. It may be used to enforce model constraints outside of gradient-based training. The function does not need to be in-place, but the weight tensor is modified in-place. Can be passed as a function, or as a string corresponding to a key in
pykeen.nn.emb.constrainers
such as:'normalize'
'complex_normalize'
'clamp'
'clamp_norm'
constrainer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed to the constrainerregularizer (
Union
[str
,Regularizer
,None
]) – A regularizer, which is applied to the selected embeddings in forward passregularizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed to the regularizerdropout (
Optional
[float
]) – A dropout value for the embeddings.
- forward(indices=None)[source]¶
Get representations for indices.
- Parameters
indices (
Optional
[LongTensor
]) – shape: s The indices, or None. If None, this is interpreted astorch.arange(self.max_id)
(although implemented more efficiently).- Return type
FloatTensor
- Returns
shape: (
*s
,*self.shape
) The representations.
- classmethod init_with_device(num_embeddings, embedding_dim, device, initializer=None, initializer_kwargs=None, normalizer=None, normalizer_kwargs=None, constrainer=None, constrainer_kwargs=None)[source]¶
Create an embedding object on the given device by wrapping
__init__()
.This method is a hotfix for not being able to pass a device during initialization of
torch.nn.Embedding
. Instead the weight is always initialized on CPU and has to be moved to GPU afterwards.- Return type
- Returns
The embedding.
- property num_embeddings: int¶
The total number of representations (i.e. the maximum ID).
- Return type
- class EmbeddingSpecification(embedding_dim=None, shape=None, initializer=None, initializer_kwargs=None, normalizer=None, normalizer_kwargs=None, constrainer=None, constrainer_kwargs=None, regularizer=None, regularizer_kwargs=None, dtype=None, dropout=None)[source]¶
An embedding specification.
- class LiteralRepresentation(numeric_literals)[source]¶
Literal representations.
Instantiate an embedding with extended functionality.
- Parameters
num_embeddings – >0 The number of embeddings.
embedding_dim – >0 The embedding dimensionality.
initializer –
An optional initializer, which takes an uninitialized (num_embeddings, embedding_dim) tensor as input, and returns an initialized tensor of same shape and dtype (which may be the same, i.e. the initialization may be in-place). Can be passed as a function, or as string corresponding to a key in
pykeen.nn.emb.initializers
such as:"xavier_uniform"
"xavier_uniform_norm"
"xavier_normal"
"xavier_normal_norm"
"normal"
"normal_norm"
"uniform"
"uniform_norm"
"init_phases"
initializer_kwargs – Additional keyword arguments passed to the initializer
normalizer – A normalization function, which is applied in every forward pass.
normalizer_kwargs – Additional keyword arguments passed to the normalizer
constrainer –
A function which is applied to the weights after each parameter update, without tracking gradients. It may be used to enforce model constraints outside of gradient-based training. The function does not need to be in-place, but the weight tensor is modified in-place. Can be passed as a function, or as a string corresponding to a key in
pykeen.nn.emb.constrainers
such as:'normalize'
'complex_normalize'
'clamp'
'clamp_norm'
constrainer_kwargs – Additional keyword arguments passed to the constrainer
regularizer – A regularizer, which is applied to the selected embeddings in forward pass
regularizer_kwargs – Additional keyword arguments passed to the regularizer
dropout – A dropout value for the embeddings.
- class RepresentationModule(max_id, shape)[source]¶
A base class for obtaining representations for entities/relations.
A representation module maps integer IDs to representations, which are tensors of floats.
max_id defines the upper bound of indices we are allowed to request (exclusively). For simple embeddings this is equivalent to num_embeddings, but more a more appropriate word for general non-embedding representations, where the representations could come from somewhere else, e.g. a GNN encoder.
shape describes the shape of a single representation. In case of a vector embedding, this is just a single dimension. For others, e.g.
pykeen.models.RESCAL
, we have 2-d representations, and in general it can be any fixed shape.We can look at all representations as a tensor of shape (max_id, *shape), and this is exactly the result of passing indices=None to the forward method.
We can also pass multi-dimensional indices to the forward method, in which case the indices’ shape becomes the prefix of the result shape: (*indices.shape, *self.shape).
Initialize the representation module.
- Parameters
- property embedding_dim: int¶
Return the “embedding dimension”. Kept for backward compatibility.
- Return type
- abstract forward(indices=None)[source]¶
Get representations for indices.
- Parameters
indices (
Optional
[LongTensor
]) – shape: s The indices, or None. If None, this is interpreted astorch.arange(self.max_id)
(although implemented more efficiently).- Return type
FloatTensor
- Returns
shape: (
*s
,*self.shape
) The representations.
- get_in_canonical_shape(indices=None)[source]¶
Get representations in canonical shape.
- Parameters
indices (
Optional
[LongTensor
]) – None, shape: (b,) or (b, n) The indices. If None, return all representations.- Return type
FloatTensor
- Returns
shape: (b?, n?, d) If indices is None, b=1, n=max_id. If indices is 1-dimensional, b=indices.shape[0] and n=1. If indices is 2-dimensional, b, n = indices.shape
- get_in_more_canonical_shape(dim, indices=None)[source]¶
Get representations in canonical shape.
The canonical shape is given as
(batch_size, d_1, d_2, d_3,
*
)fulfilling the following properties:
Let i = dim. If indices is None, the return shape is (1, d_1, d_2, d_3) with d_i = num_representations, d_i = 1 else. If indices is not None, then batch_size = indices.shape[0], and d_i = 1 if indices.ndimension() = 1 else d_i = indices.shape[1]
The canonical shape is given by (batch_size, 1,
*
) if indices is not None, where batch_size=len(indices), or (1, num,*
) if indices is None with num equal to the total number of embeddings.Examples: >>> emb = EmbeddingSpecification(shape=(20,)).make(num_embeddings=10) >>> # Get head representations for given batch indices >>> emb.get_in_more_canonical_shape(dim=”h”, indices=torch.arange(5)).shape (5, 1, 1, 1, 20) >>> # Get head representations for given 2D batch indices, as e.g. used by fast slcwa scoring >>> emb.get_in_more_canonical_shape(dim=”h”, indices=torch.arange(6).view(2, 3)).shape (2, 3, 1, 1, 20) >>> # Get head representations for 1:n scoring >>> emb.get_in_more_canonical_shape(dim=”h”, indices=None).shape (1, 10, 1, 1, 20)
- Parameters
- Return type
FloatTensor
- Returns
shape: (batch_size, d1, d2, d3,
*self.shape
)