Force-directed layouts for Large Graphs with GPU acceleration
Project description
torch-graph-force [WIP]
A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.
The aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.
Install
- Install PyTorch (follow official instructions)
- Install
torch-graph-force
:
pip install git+https://github.com/tintn/torch-graph-force.git
Usage
Create GraphDataset
for The Graph.
The dataset can be created from a dataframe, an edgelist or Networkx Graph using from_pandas_dataframe
, from_edgelist
, or from_networkx
respectively. from_pandas_dataframe
is the recommended way as it's more efficient compared to other methods.
If the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:
import pandas as pd
import torch_graph_force
# The first argument is a dataframe of edges with at least two columns for source and target nodes.
# By default, column names "source", "target" and "weight" are taken as source nodes, target nodes and edge weights.
df = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])
# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.
# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.
n_nodes = 4
# Create a GraphDataset for the graph
ds = torch_graph_force.from_pandas_dataframe(
df, n_nodes
)
If the node IDs are not consecutive integers, a list of node IDs must be provided:
import pandas as pd
import torch_graph_force
df = pd.DataFrame([["A", "B"], ["B", "C"], ["C", "D"]], columns=['source', 'target'])
# Order of the nodes in "nodes" is used to map the node IDs to node indices.
nodes = ["A", "B", "C", "D"]
ds = torch_graph_force.from_pandas_dataframe(
df, nodes
)
# the dataset's order follows the order of the provided list of nodes. In this example, calling ds[0] will return the data for node "A" and ds[1] for node "B"
# List of nodes can be access with ds.nodes
print(ds.nodes)
Compute Graph Layout
Once having the graph dataset ready, we can feed the dataset to spring_layout
to compute the graph layout.
pos = torch_graph_force.spring_layout(
ds
)
# pos is a numpy array of size (n_nodes, n_dim)
# each row represents the position of a node with corresponding index
print(pos)
# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list
node_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}
Optional arguments for spring_layout
:
batch_size
: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)iterations
: Maximum number of iterations taken. (default: 50)num_workers
: number of workers to fetch data from GraphDataset. If device is "cuda",num_workers
must be 0. (default: 0)device
: the device to store the graph and the layout model. If None, it's "cuda" if cuda is available otherwise "cpu". (default: None)iteration_progress
: monitor the progress of each iteration, it's useful for large graph. (default: False)layout_config
: additional config for the layout model. (default: {})
The layout model has some parameters with default values:
default_layout_config = {
# Tensor of shape (n_nodes, ndim) for initial positions
"pos": None,
# Optimal distance between nodes
"k": None,
# Dimension of the layout
"ndim": 2,
# Threshold for relative error in node position changes.
"threshold": 1e-4,
}
Use the layout_config
argument to change the parameters if needed. The example below provides intial positions for the layout model:
n_nodes = len(ds)
n_dim = 2
# Generate initial positions for the nodes
init_pos = np.random.rand(n_nodes, n_dim)
pos = torch_graph_force.spring_layout(
ds,
layout_config={"pos": init_pos}
)
Benchmarks
The implementation from torch-graph-force
without GPU acceleration is 1.5x faster than Networkx's implementation.
GPU accelerated torch-graph-force
can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.
Code for the benchmarks can be found here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torch_graph_force-0.1.1.tar.gz
.
File metadata
- Download URL: torch_graph_force-0.1.1.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 373ac784fc1b0a0956e94182532d894a14c34e020fa7e78976f3c6b30db11225 |
|
MD5 | bdf7d47fed2ea05c5998ae719e3d610a |
|
BLAKE2b-256 | 157cf9d2f1c8ad55311181de5b7fe46e4e45218618b3a1eab1bccef095193a2a |
File details
Details for the file torch_graph_force-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: torch_graph_force-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 125f06f8e915f93ad77cf7fcc6463e17f4b70530baf36f465118a29f9e355ed5 |
|
MD5 | 07dd4374484daf3d8acef2e0a1d61851 |
|
BLAKE2b-256 | 022456f29ab8e1b896d9fdc09096ce4698a86001504fc2c31376a30c0907690b |