Package to compute TP similarities between nodes in a network.
Project description
TP Similarity
TP Similarity is a Python package for computing Transition Probability (TP) similarities between nodes in a network. It offers various methods to estimate these similarities, including exact computation, estimation via random walks, shortest paths, and node2vec-based cosine similarity.
Table of Contents
Overview
TP similarity is a measure designed for papers and authors, simulating a literature search procedure on citation networks. Inspired by information retrieval concepts, this approach does not rely on curated classification systems, avoids clustering complexities, and provides a continuous measure of similarity between nodes. By implementing the TP similarity measure, researchers can approximate the research interest similarity of individual scientists using publication-level information.
The package accompanies the paper:
Varga, Attila, Sadamori Kojaku, and Filipi N. Silva. "Measuring Research Interest Similarity with Transition Probabilities." arXiv preprint arXiv:2409.18240 (2024). Available on arXiv
Installation
Install the package using pip:
pip install tpsimilarity
Features
- Exact Transition Probabilities (TP): Computes the exact transition probabilities between nodes in a graph.
- Estimated Transition Probabilities: Estimates transition probabilities using random walks.
- Shortest Paths Transition Probabilities: Computes transition probabilities along the shortest paths.
- Node2Vec Similarity: Computes cosine similarity between node embeddings generated by node2vec.
Getting Started
Prerequisites
- Python: Version 3.6 or higher
- Dependencies:
numpy
scipy
networkx
gensim
tqdm
joblib
- (Optional)
igraph
for handling large graphs efficiently
Install the dependencies using:
pip install numpy scipy networkx gensim tqdm joblib igraph
Importing the Package
from tpsimilarity import similarity
Example Usage
1. Compute Exact Transition Probabilities (TP)
import networkx as nx
from tpsimilarity import similarity
# Create or load your graph
G = nx.karate_club_graph()
# Convert NetworkX graph to iGraph
G = ig.Graph.from_networkx(G)
# Define sources and targets
sources = [0, 1, 2] # Source nodes
targets = [3, 4, 5] # Target nodes
# Compute exact TP similarities
tp_sim = similarity.TP(
graph=G,
sources=sources,
targets=targets,
window_length=5
)
# tp_sim contains the similarity matrix or list based on return_type
2. Compute Estimated Transition Probabilities
# Estimate TP similarities using random walks
estimated_tp = similarity.estimatedTP(
graph=G,
sources=sources,
targets=targets,
window_length=5,
walks_per_source=1000,
batch_size=100,
return_type="matrix",
degreeNormalization=True,
progressBar=True
)
3. Compute Node2Vec Similarity
# Compute node2vec-based cosine similarities
node2vec_sim = similarity.node2vec(
graph=G,
sources=sources,
targets=targets,
dimensions=64,
window_length=40,
context_size=5,
workers=4,
batch_walks=100,
return_type="matrix",
progressBar=True
)
4. Compute Shortest Paths Transition Probabilities
# Compute TP similarities along shortest paths
sp_tp = similarity.shortestPathsTP(
graph=G,
sources=sources,
targets=targets,
window_length=5
)
Parameters
- graph (
networkx.Graph
origraph.Graph
): The graph on which to compute the similarities. - sources (
list
): List of source node indices. - targets (
list
): List of target node indices. - window_length (
int
): The length of the random walks. - return_type (
str
, optional): The format of the output ("list"
,"matrix"
, or"dict"
). Default is"matrix"
. - degreeNormalization (
bool
, optional): Whether to normalize by the degree of the target node. Default isTrue
. - dimensions (
int
, optional): Number of dimensions for node embeddings in node2vec. Default is64
. - context_size (
int
, optional): Context size for the node2vec algorithm. Default is10
. - workers (
int
, optional): Number of parallel workers for node2vec. Default is4
. - batch_walks (
int
, optional): Number of walks per batch for node2vec. Default is10000
. - progressBar (
bool
ortqdm
, optional): Whether to display a progress bar during computation. Default isTrue
.
Examples
You can find more examples and tutorials in the examples directory or in the Jupyter notebooks provided.
Authors
- Attila Varga
- Sadamori Kojaku
- Filipi N. Silva
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tpsimilarity-0.6.0.tar.gz
.
File metadata
- Download URL: tpsimilarity-0.6.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d340ff1c1f884fad19cc2a35a6e39583ecdb5f752ec13b28279cdd709630fb0 |
|
MD5 | 1d622a7216d11865eb420613c3f00d7f |
|
BLAKE2b-256 | aec570ae50fac2e4940bc54e75a81e682572e25a1bb84015d7d99d4ee9829934 |
File details
Details for the file tpsimilarity-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: tpsimilarity-0.6.0-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f220f7a726ca87f9f04e7b7f5b2c175d07ec246d084ffabac277e66fbf63327a |
|
MD5 | aec0cc82efb4ebce95451eac5a78dfd9 |
|
BLAKE2b-256 | 58cc005a891f4aefe57b406912d2a83167548fc00f33f5ccb5e1affb51cddffc |