TensorFlow implementation of the Graphical Attention Recurrent Neural Net Model
TensorFlow implementation of Graphical Attention Recurrent Neural Networks based on work by Cirstea et al., 2019.
Moreover, we offer stand-alone implementations of the Graph Attention Mechanism (Veličković et al., 2017) and Diffusional Graph Convolution (Li et al., 2017).
pip install garnn. Dependencies are
The core data structure is the graph signal. If we have N nodes in a graph each having F observed features then the graph signal is the tensor with shape (batch, N, F) corresponding to the data produced by all nodes. Often we have sequences of graph signals in a time series. We will call them temporal graph signals and assume a shape of (batch, time steps, N, F). We also need to know the adjacency matrix E of the underlying graph with shape (N, N).
Non-Temporal Data (batch, N, F)
All but the recurrent layers work with non-temporal data, i.e. the data points are individual graph signals and not sequences of graph signals.
AttentionMechanism found in
garnn.components.attention will take a graph signal and return an attention matrix as described in Veličković et al., 2017.
The layer is initiated with the following parameters:
||Dimension of internal embedding.|
||Adjacency matrix of the graph.|
||Number of attention matrices that are averaged to return the output attention.|
||Whether or not to calculate A_in and A_out as done by Cirstea et al., 2019. If E is symmetric then the value will be set to False.|
The output is of shape (batch, N, N). If
use_reverse_diffusion is true then we obtain 2 attention matrices and thus the shape is (batch, 2, N, N).
GraphDiffusionConvolution layer in
garnn.layers.diffconv offers diffusion graph convolution as described by Li et al., 2017. It operates on a tuple containing a graph signal X and a transition matrix A (usually an attention matrix returned by an attention mechanism) and is initiated with the following parameters
||Number output features. Q in the paper.|
||Number of hops done by the diffusion process. K in the paper.|
There are more specialised parameters like regularisers and initialisers -- those can be found in the doc string. The convolutional layer returns a diffused graph signal of shape (batch, N, Q).
Thus, if we have 10 nodes with 5 features each and we would like to apply diffusion graph convolution with 20 features using a 5-head attention mechanism with an internal embedding of 64 units then we would need to run
from garnn.components.attention import AttentionMechanism from garnn.layers.diffconv import GraphDiffusionConvolution from tensorflow.keras.layers import Input # input of 10 by 5 graph signal inputs = Input(shape=(10, 5)) # Initiating attention mechanism. Make sure you define E Attn = AttentionMechanism(64, adjacency_matrix=E, num_heads=5)(inputs) # Now the convolutional layer. Make sure you use the correct order in the # input-tuple: Graph signal is always first! output = GraphDiffusionConvolution( features=10, num_diffusion_steps=5)((inputs, Attn))
Temporal Data (batch, time steps, N, F)
DiffusionGraphConvolution naturally extend to temporal graph signals. The output now simply has an additional time steps dimension.
garnn_gru layer found in
garnn.components.garnn_gru is the diffusional & attention-based GRU introduced by Cirstea et al., 2019. It operates on temporal graph signals and an attention mechanism. Initiate with
||Number of features in the hidden state. (Q in the paper)|
||Number of hops done by the diffusion process in all internal convolutions. K in the paper.|
||Whether or not the RNN should return the hidden state at each time step or only at the final time step. Set it to
Hence, if we would like to rebuild a model similar to the one used by Cirstea et al., 2019 then one needs to run
import numpy as np from tensorflow.keras import Model from tensorflow.keras.layers import Input from garnn.components.attention import AttentionMechanism from garnn.components.garnn_gru import garnn_gru # We assume we have 207 nodes with 10 features each. # For simplicity we create a random adjacency matrix E E = np.random.randint(0, 2, size=(207, 207)) # Input of the temporal graph signals. Note that "None" allows # us to pass in variable length time series. X = Input(shape=(None, 207, 10)) # creating attention mechanism with 3 heads and an # embedding-size of 16 A = AttentionMechanism(16, adjacency_matrix=E, num_heads=3)(X) # Piping X and A into the 2 stacked GRU layers. # First layer streches the features into 64 diffused features. # We assume that we are using 6 hop diffusions. gru_1 = garnn_gru(num_hidden_features=64, num_diffusion_steps=6, return_sequences=True)( (X, A) ) # And then we use the 2nd GRU to shrink it back to 1 feature - the feature # that we are predicting. We use the same attention for this layer, but note that # we could've also introduce a new attention mechanism for this GRU. output = garnn_gru(num_hidden_features=1, num_diffusion_steps=6)((gru_1, A)) garnn_model = Model(inputs=X, outputs=output)
Bug reports, fixes and additional features are always welcome! Make sure to run the tests with
python setup.py test and write your own for new features. Thanks.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.