Skip to main content

Force-directed layouts for Large Graphs with GPU acceleration

Project description

torch-graph-force [WIP]

A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.

The aim of this project is to speed up the process of obtaining low-dimensional layouts for large graphs, especially with GPU acceleration.

Install

pip install git+https://github.com/tintn/torch-graph-force.git

Usage

Create GraphDataset for The Graph.

The dataset can be created from a dataframe, an edgelist or Networkx Graph using from_pandas_dataframe, from_edgelist, or from_networkx respectively. from_pandas_dataframe is the recommended way as it's more efficient compared to other methods.

If the node IDs are consecutive integers starting from 0:

import pandas as pd
import torch_graph_force

# The first argument is a dataframe of edges with at least two columns for source and target nodes.
# By default, column names "source", "target" and "weight" are taken as source nodes, target nodes and edge weights.
df = pd.DataFrame([[0, 1], [1, 2], [2, 3]], columns=['source', 'target'])
# Having a column for edge weights is optional. If the column for edge weights does not exist, 1.0 will be used for all edges.
# The second argument is the number of nodes in case the node IDs are consecutive integers starting from 0.
n_nodes = 4
# Create a GraphDataset for the graph
ds = torch_graph_force.from_pandas_dataframe(
    df, n_nodes
)

If the node IDs are not consecutive integers, a list of node IDs must be provided:

import pandas as pd
import torch_graph_force

df = pd.DataFrame([["A", "B"], ["B", "C"], ["C", "D"]], columns=['source', 'target'])
# Order of the nodes in "nodes" is used to map the node IDs to node indices.
nodes = ["A", "B", "C", "D"]

ds = torch_graph_force.from_pandas_dataframe(
    df, nodes
)
# the dataset's order follows the order of the provided list of nodes. In this example, calling  ds[0] will return the data for node "A" and ds[1] for node "B"
# List of nodes can be access with ds.nodes
print(ds.nodes)

Compute Graph Layout

Once having the graph dataset ready, we can feed the dataset to spring_layout to compute the graph layout.

pos = torch_graph_force.spring_layout(
    ds
)
# pos is a numpy array of size (n_nodes, n_dim)
# each row represents the position of a node with corresponding index
print(pos)
# if node IDs are not consecutive integers, the nodes' positions can be obtained from the node list
node_pos = {nid: pos[idx] for idx, nid in enumerate(ds.nodes)}

Optional arguments for spring_layout:

  • batch_size: number of nodes to process in a batch. Larger batch size usually speeds up the processing, but it consumes more memory. (default: 64)
  • iterations: Maximum number of iterations taken. (default: 50)
  • num_workers: number of workers to fetch data from GraphDataset. If device is "cuda", num_workers must be 0. (default: 0)
  • device: the device to store the graph and the layout model. If None, it's "cuda" if cuda is available otherwise "cpu". (default: None)
  • iteration_progress: monitor the progress of each iteration, it's useful for large graph. (default: False)
  • layout_config: additional config for the layout model. (default: {})

The layout model has some parameters with default values:

default_layout_config = {
    # Tensor of shape (n_nodes, ndim) for initial positions
    "pos": None,
    # Optimal distance between nodes
    "k": None,
    # Dimension of the layout
    "ndim": 2,
    # Threshold for relative error in node position changes.
    "threshold": 1e-4,
}

Use the layout_config argument to change the parameters if needed. The example below provides intial positions for the layout model:

n_nodes = len(ds)
n_dim = 2
# Generate initial positions for the nodes
init_pos = np.random.rand(n_nodes, n_dim)
pos = torch_graph_force.spring_layout(
    ds,
    layout_config={"pos": init_pos}
)

Benchmarks

The implementation from torch-graph-force without GPU acceleration is 1.5x faster than Networkx's implementation.

CPU Benchmark

GPU accelerated torch-graph-force can compute layouts of graphs with 100k nodes within minutes. The benchmark was conducted with Tesla P100.

GPU Benchmark

Code for the benchmarks can be found here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_graph_force-0.1.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

torch_graph_force-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file torch_graph_force-0.1.0.tar.gz.

File metadata

  • Download URL: torch_graph_force-0.1.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for torch_graph_force-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f7c9447e40afc5b78b1f62f4dd4edc779c338b33f0b1572e44c79a79233ff5d4
MD5 7f52a953b1c3dfa124d0cfcdd2b1a305
BLAKE2b-256 8d89dfd1410b70ab784668f7662f078d94f8811e53c37069d4a188c7c89ceff0

See more details on using hashes here.

File details

Details for the file torch_graph_force-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for torch_graph_force-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bb3942c559db78877fee9a1b0fc441c576ff223a0374d1373d6a129677c3570
MD5 bfa4109544490f420cb49053f73f37fb
BLAKE2b-256 a4671f5f717c3b7c3c0fa0109123bdd8874d50903463182b8206f39dfb68b1e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page