Skip to main content

Force-directed layouts for Large Graphs with GPU acceleration

Project description

torch-graph-force [WIP]

A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.

The aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.

Install

pip install git+https://github.com/tintn/torch-graph-force.git

Usage

Create GraphDataset for The Graph.

The dataset can be created from a dataframe, an edgelist or Networkx Graph using from_pandas_dataframe, from_edgelist, or from_networkx respectively. from_pandas_dataframe is the recommended way as it's more efficient compared to other methods.

If the node IDs are consecutive integers starting from 0, the dataset can be constructed with a dataframe for edges and the number of nodes:

import pandas as pd
import torch_graph_force

# The first argument is a dataframe of edges with at least two columns for source and target nodes.
# By default, column names "source", "target" and "weight" are taken as source nodes, target nodes and edge weights.
df = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])
# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.
# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.
n_nodes = 4
# Create a GraphDataset for the graph
ds = torch_graph_force.from_pandas_dataframe(
    df, n_nodes
)

If the node IDs are not consecutive integers, a list of node IDs must be provided:

import pandas as pd
import torch_graph_force

df = pd.DataFrame([["A", "B"], ["B", "C"], ["C", "D"]], columns=['source', 'target'])
# Order of the nodes in "nodes" is used to map the node IDs to node indices.
nodes = ["A", "B", "C", "D"]

ds = torch_graph_force.from_pandas_dataframe(
    df, nodes
)
# the dataset's order follows the order of the provided list of nodes. In this example, calling  ds[0] will return the data for node "A" and ds[1] for node "B"
# List of nodes can be access with ds.nodes
print(ds.nodes)

Compute Graph Layout

Once having the graph dataset ready, we can feed the dataset to spring_layout to compute the graph layout.

pos = torch_graph_force.spring_layout(
    ds
)
# pos is a numpy array of size (n_nodes, n_dim)
# each row represents the position of a node with corresponding index
print(pos)
# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list
node_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}

Optional arguments for spring_layout:

  • batch_size: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)
  • iterations: Maximum number of iterations taken. (default: 50)
  • num_workers: number of workers to fetch data from GraphDataset. If device is "cuda", num_workers must be 0. (default: 0)
  • device: the device to store the graph and the layout model. If None, it's "cuda" if cuda is available otherwise "cpu". (default: None)
  • iteration_progress: monitor the progress of each iteration, it's useful for large graph. (default: False)
  • layout_config: additional config for the layout model. (default: {})

The layout model has some parameters with default values:

default_layout_config = {
    # Tensor of shape (n_nodes, ndim) for initial positions
    "pos": None,
    # Optimal distance between nodes
    "k": None,
    # Dimension of the layout
    "ndim": 2,
    # Threshold for relative error in node position changes.
    "threshold": 1e-4,
}

Use the layout_config argument to change the parameters if needed. The example below provides intial positions for the layout model:

n_nodes = len(ds)
n_dim = 2
# Generate initial positions for the nodes
init_pos = np.random.rand(n_nodes, n_dim)
pos = torch_graph_force.spring_layout(
    ds,
    layout_config={"pos": init_pos}
)

Benchmarks

The implementation from torch-graph-force without GPU acceleration is 1.5x faster than Networkx's implementation.

CPU Benchmark

GPU accelerated torch-graph-force can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.

GPU Benchmark

Code for the benchmarks can be found here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_graph_force-0.1.1.tar.gz (10.4 kB view hashes)

Uploaded Source

Built Distribution

torch_graph_force-0.1.1-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page