Skip to main content

SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

Project description

SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

This repository contains the implementation of SNoRe algorithm from SNoRe paper found here:

@ARTICLE{meznar2020snore,
         author={S. {Me\v{z}nar} and N. {Lavra\v{c}} and B. {\v{S}krlj}},
         journal={IEEE Access}, 
         title={SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations},
         year={2020},
         volume={8},
         number={}, 
         pages={212568-212588},
         doi={10.1109/ACCESS.2020.3039541}}

An overview of the algorithm is presented in the image below.

algorithm overview

Installing SNoRe

python setup.py install

or

pip install snore-embedding

Using SNoRe

A simple use-case is shown below. First, we import the necessary libraries and load the dataset and its labels.

from snore import SNoRe
from scipy.io import loadmat
from sklearn.utils import shuffle
from catboost import CatBoost
import pandas as pd
from sklearn.metrics import f1_score
import numpy as np

# Load adjacency matrix and labels
dataset = loadmat("data/cora.mat")
network_adj = dataset["network"]
labels = dataset["group"]

We then create the SNoRe model and embed the network. In code, the default parameters are shown.

# Create the model
model = SNoRe(dimension=256, num_walks=1024, max_walk_length=5,
              inclusion=0.005, fixed_dimension=False, metric="cosine",
              num_bins=256)

# Embed the network
embedding = model.embed(network_adj)

Finally, we train the classifier and test on the remaining data.

# Train the classifier
nodes = shuffle([i for i in range(network_adj.shape[0])])
train_mask = nodes[:int(network_adj.shape[0]*0.8)]
test_mask = nodes[int(network_adj.shape[0]*0.8):]
classifier = CatBoost(params={'loss_function': 'MultiRMSE', 'iterations': 500})
df = pd.DataFrame.sparse.from_spmatrix(embedding)
classifier.fit(df.iloc[train_mask], labels[train_mask])

# Test prediction
predictions = classifier.predict(df.iloc[test_mask])
print("Micro score:",
      f1_score(np.argmax(labels[test_mask], axis=1),
               np.argmax(predictions, axis=1),
               average='micro'))

Further examples of evaluation and embedding explainability can be found in the examples folder.

Hyperparameter explanation

SNoRe uses the following hyperparameters and their default values:

Hyperparameter Description Default Value
dimension The number of features if fixed number of features are used, otherwise the number of features that make up space equivalent to |N|*dimensions 256
num_walks The number of random walks for every node 1024
max_walk_length The length of the longest random walk 5
inclusion Inclusion threshold. Node needs to be encountered with frequency inclusion to appear in the hash representation 0.005
fixed_dimension If True, fixed number of features are used, otherwise space equivalent to |N|*dimensions is used False
metric Metric used for similarity calculation. Metrics 'cosine','HPI','HDI','euclidean', 'jaccard', 'seuclidean', and 'canberra' can be used when calculating the embedding of fixed dimensions, otherwise 'cosine', 'HPI', and 'HDI' can be used 'cosine'
num_bins Number of bins used in SNoRe SDF to digitize the embedding and reduce it's size. The values are not digitized if None is chosen. 256

Results against other baselines

In the above mentioned paper we test SNoRe and it's extension SNoRe SDF against NetMF (SCD), Deepwalk, node2vec, LINE, PPRS, VGAE, Label Propagation, and the random baseline. The results can be seen on the image below.

micro f1 results

By aggregating this results we get scores presented in the table below.

micro f1 table

Embedding interpretability with SHAP

An advantage of SNoRe is the ability to interpret why instances were predicted the way they were. We can do such interpretation for a single instance as show in the image below.

micro f1 table

We can also see which features are the most important with the summary plot shown in the image below.

micro f1 table

To try the interpretation for yourself use code in the example examples/explainability_example.py.

Latent clustering with UMAP

We can use tools such as UMAP to cluster the embedding we create with SNoRe and see if nodes with similar labels cluster together. Such clusterings can be seen in the image below.

micro f1 results

To create such clustering you can start with code in examples/umap_example.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snore-embedding-0.3.3.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

snore_embedding-0.3.3-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file snore-embedding-0.3.3.tar.gz.

File metadata

  • Download URL: snore-embedding-0.3.3.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for snore-embedding-0.3.3.tar.gz
Algorithm Hash digest
SHA256 5b622a1dea6496601101eef95b83796c4de5efb5b925e2af50cc79741a2934ba
MD5 47d7e40b0ebc321838d48084ddb3ae00
BLAKE2b-256 9c469556ce068023e139fe92264a519066c0687dda1a1d9a50c9c2e9886caf4f

See more details on using hashes here.

File details

Details for the file snore_embedding-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: snore_embedding-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for snore_embedding-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9672654e57b99268a830af37da62dc2802f36beaf4c20776c830b7fc5797e3bc
MD5 9a26817151157b11f5c6b5639b9a1064
BLAKE2b-256 c48993a91bed9920f9f5f40d0e36c56bfd63d1f939e84ae1f00e21e9ab82ddc5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page