Skip to main content

Package to compute TP similarities between nodes in a network.

Project description

TP Similarity

TP Similarity is a Python package for computing Transition Probability (TP) similarities between nodes in a network. It offers various methods to estimate these similarities, including exact computation, estimation via random walks, shortest paths, and node2vec-based cosine similarity.

Build Status Coverage Status

Table of Contents

Overview

TP similarity is a measure designed for papers and authors, simulating a literature search procedure on citation networks. Inspired by information retrieval concepts, this approach does not rely on curated classification systems, avoids clustering complexities, and provides a continuous measure of similarity between nodes. By implementing the TP similarity measure, researchers can approximate the research interest similarity of individual scientists using publication-level information.

The package accompanies the paper:

Varga, Attila, Sadamori Kojaku, and Filipi N. Silva. "Measuring Research Interest Similarity with Transition Probabilities." arXiv preprint arXiv:2409.18240 (2024). Available on arXiv

Installation

Install the package using pip:

pip install tpsimilarity

Features

  • Exact Transition Probabilities (TP): Computes the exact transition probabilities between nodes in a graph.
  • Estimated Transition Probabilities: Estimates transition probabilities using random walks.
  • Shortest Paths Transition Probabilities: Computes transition probabilities along the shortest paths.
  • Node2Vec Similarity: Computes cosine similarity between node embeddings generated by node2vec.

Getting Started

Prerequisites

  • Python: Version 3.6 or higher
  • Dependencies:
    • numpy
    • scipy
    • gensim
    • tqdm
    • joblib
    • igraph
    • (optional) networkx

Install the dependencies using:

pip install numpy scipy networkx gensim tqdm joblib igraph

Importing the Package

from tpsimilarity import similarity

Example Usage

1. Compute Exact Transition Probabilities (TP)

import networkx as nx
import igraph as ig
from tpsimilarity import similarity

# Create or load your graph
G = nx.karate_club_graph()

# Convert NetworkX graph to iGraph
G = ig.Graph.from_networkx(G)

# Define sources and targets
sources = [0, 1, 2]  # Source nodes
targets = [3, 4, 5]  # Target nodes

# Compute exact TP similarities
tp_sim = similarity.TP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5
)

# tp_sim contains the similarity matrix or list based on return_type

2. Compute Estimated Transition Probabilities

# Estimate TP similarities using random walks
estimated_tp = similarity.estimatedTP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5,
    walks_per_source=1000,
    batch_size=100,
    return_type="matrix",
    degreeNormalization=True,
    progressBar=True
)

3. Compute Node2Vec Similarity

# Compute node2vec-based cosine similarities
node2vec_sim = similarity.node2vec(
    graph=G,
    sources=sources,
    targets=targets,
    dimensions=64,
    window_length=40,
    context_size=5,
    workers=4,
    batch_walks=100,
    return_type="matrix",
    progressBar=True
)

4. Compute Shortest Paths Transition Probabilities

# Compute TP similarities along shortest paths
sp_tp = similarity.shortestPathsTP(
    graph=G,
    sources=sources,
    targets=targets,
    window_length=5
)

Parameters

  • graph (networkx.Graph or igraph.Graph): The graph on which to compute the similarities.
  • sources (list): List of source node indices.
  • targets (list): List of target node indices.
  • window_length (int): The length of the random walks.
  • return_type (str, optional): The format of the output ("list", "matrix", or "dict"). Default is "matrix".
  • degreeNormalization (bool, optional): Whether to normalize by the degree of the target node. Default is True.
  • dimensions (int, optional): Number of dimensions for node embeddings in node2vec. Default is 64.
  • context_size (int, optional): Context size for the node2vec algorithm. Default is 10.
  • workers (int, optional): Number of parallel workers for node2vec. Default is 4.
  • batch_walks (int, optional): Number of walks per batch for node2vec. Default is 10000.
  • progressBar (bool or tqdm, optional): Whether to display a progress bar during computation. Default is True.

Examples

You can find more examples and tutorials in the examples directory or in the Jupyter notebooks provided.

Authors

  • Attila Varga
  • Sadamori Kojaku
  • Filipi N. Silva

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpsimilarity-0.6.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tpsimilarity-0.6.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file tpsimilarity-0.6.1.tar.gz.

File metadata

  • Download URL: tpsimilarity-0.6.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tpsimilarity-0.6.1.tar.gz
Algorithm Hash digest
SHA256 92c8ad43c567257a5859e33f71bc9e6459db4ea51db852eb70cedb57ffcf212c
MD5 b3968a5a0a50a96ff2230148e0ccb138
BLAKE2b-256 0b477f60010b466e0ab2aa9bffba6fa12c83b8acbc315e11e7239ace86632e33

See more details on using hashes here.

Provenance

The following attestation bundles were made for tpsimilarity-0.6.1.tar.gz:

Publisher: publish.yml on filipinascimento/tpsimilarity

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tpsimilarity-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: tpsimilarity-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tpsimilarity-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 766c76a6684d448cca2855d3a454bc3596ed55d03738c40509b0efc65792071d
MD5 d6a517daf0d9bc7747b8bb99ec503040
BLAKE2b-256 c78060d78a8676224a48e0d07e5593199c90a7f68e2f7bd85ebff1c5eada9d9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tpsimilarity-0.6.1-py3-none-any.whl:

Publisher: publish.yml on filipinascimento/tpsimilarity

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page