Skip to main content

Package to compute TP similarities between nodes in a network.

Project description

TP Similarity

TP Similarity is a Python package for computing Transition Probability (TP) similarities between nodes in a network. It offers various methods to estimate these similarities, including exact computation, estimation via random walks, shortest paths, and node2vec-based cosine similarity.

Build Status Coverage Status

Table of Contents

Overview

TP similarity is a measure designed for papers and authors, simulating a literature search procedure on citation networks. Inspired by information retrieval concepts, this approach does not rely on curated classification systems, avoids clustering complexities, and provides a continuous measure of similarity between nodes. By implementing the TP similarity measure, researchers can approximate the research interest similarity of individual scientists using publication-level information.

The package accompanies the paper:

Varga, Attila, Sadamori Kojaku, and Filipi N. Silva. "Measuring Research Interest Similarity with Transition Probabilities." arXiv preprint arXiv:2409.18240 (2024). Available on arXiv

Installation

Install the package using pip:

pip install tpsimilarity

Features

  • Exact Transition Probabilities (TP): Computes the exact transition probabilities between nodes in a graph.
  • Estimated Transition Probabilities: Estimates transition probabilities using random walks.
  • Shortest Paths Transition Probabilities: Computes transition probabilities along the shortest paths.
  • Node2Vec Similarity: Computes cosine similarity between node embeddings generated by node2vec.

Getting Started

Prerequisites

  • Python: Version 3.6 or higher
  • Dependencies:
    • numpy
    • scipy
    • networkx
    • gensim
    • tqdm
    • joblib
    • (Optional) igraph for handling large graphs efficiently

Install the dependencies using:

pip install numpy scipy networkx gensim tqdm joblib igraph

Importing the Package

from tpsimilarity import similarity

Example Usage

1. Compute Exact Transition Probabilities (TP)

import networkx as nx
from tpsimilarity import similarity

# Create or load your graph
G = nx.karate_club_graph()

# Convert NetworkX graph to iGraph
G = ig.Graph.from_networkx(G)

# Define sources and targets
sources = [0, 1, 2]  # Source nodes
targets = [3, 4, 5]  # Target nodes

# Compute exact TP similarities
tp_sim = similarity.TP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5
)

# tp_sim contains the similarity matrix or list based on return_type

2. Compute Estimated Transition Probabilities

# Estimate TP similarities using random walks
estimated_tp = similarity.estimatedTP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5,
    walks_per_source=1000,
    batch_size=100,
    return_type="matrix",
    degreeNormalization=True,
    progressBar=True
)

3. Compute Node2Vec Similarity

# Compute node2vec-based cosine similarities
node2vec_sim = similarity.node2vec(
    graph=G,
    sources=sources,
    targets=targets,
    dimensions=64,
    window_length=40,
    context_size=5,
    workers=4,
    batch_walks=100,
    return_type="matrix",
    progressBar=True
)

4. Compute Shortest Paths Transition Probabilities

# Compute TP similarities along shortest paths
sp_tp = similarity.shortestPathsTP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5
)

Parameters

  • graph (networkx.Graph or igraph.Graph): The graph on which to compute the similarities.
  • sources (list): List of source node indices.
  • targets (list): List of target node indices.
  • window_length (int): The length of the random walks.
  • return_type (str, optional): The format of the output ("list", "matrix", or "dict"). Default is "matrix".
  • degreeNormalization (bool, optional): Whether to normalize by the degree of the target node. Default is True.
  • dimensions (int, optional): Number of dimensions for node embeddings in node2vec. Default is 64.
  • context_size (int, optional): Context size for the node2vec algorithm. Default is 10.
  • workers (int, optional): Number of parallel workers for node2vec. Default is 4.
  • batch_walks (int, optional): Number of walks per batch for node2vec. Default is 10000.
  • progressBar (bool or tqdm, optional): Whether to display a progress bar during computation. Default is True.

Examples

You can find more examples and tutorials in the examples directory or in the Jupyter notebooks provided.

Authors

  • Attila Varga
  • Sadamori Kojaku
  • Filipi N. Silva

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpsimilarity-0.6.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

tpsimilarity-0.6.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file tpsimilarity-0.6.0.tar.gz.

File metadata

  • Download URL: tpsimilarity-0.6.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for tpsimilarity-0.6.0.tar.gz
Algorithm Hash digest
SHA256 3d340ff1c1f884fad19cc2a35a6e39583ecdb5f752ec13b28279cdd709630fb0
MD5 1d622a7216d11865eb420613c3f00d7f
BLAKE2b-256 aec570ae50fac2e4940bc54e75a81e682572e25a1bb84015d7d99d4ee9829934

See more details on using hashes here.

File details

Details for the file tpsimilarity-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: tpsimilarity-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for tpsimilarity-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f220f7a726ca87f9f04e7b7f5b2c175d07ec246d084ffabac277e66fbf63327a
MD5 aec0cc82efb4ebce95451eac5a78dfd9
BLAKE2b-256 58cc005a891f4aefe57b406912d2a83167548fc00f33f5ccb5e1affb51cddffc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page