Skip to main content

Noisy Graph Alignment Benchmark (NGAB) - a library for benchmarking GNNs on graph alignment tasks and generating positional encodings.

Project description

ngab — Noisy Graph Alignment Benchmark

NGAB is a lightweight Python library for benchmarking graph neural networks (GNNs) on noisy graph alignment and matching tasks, and for pretraining GNNs with alignment-oriented objectives.

It provides:

  • A lightweight dataset format (safetensors) for graph pairs, and utilities to download prepared datasets from the Hugging Face Hub.
  • Utilities to create your own graph-alignment datasets from existing datasets.
  • A consistent torch-based training loop and metrics for graph alignment (loss, LAP accuracy, top-k accuracy), enabling fast benchmarking of new architectures and pretraining GNNs for high-quality positional encodings.
  • A suite of GNN backbones (GCN, GAT, GATv2, GatedGCN, GIN, TAGCN, SGC, GraphGPS, PNA, PAN).

This repository accompanies the paper Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings. The GitHub repository is available at Graph-Alignment-Benchmark, where you can see how we used ngab to run the experiments. It includes helper scripts to generate datasets and Hydra-parameterized experiments for full reproducibility and automation.

Install

Requirements: Python 3.12+, PyTorch 2.8+, and PyTorch Geometric 2.6+

You can install the package with your preferred tool.

pip install ngab
uv add ngab

This project uses additional PyG wheel links (see pyproject.toml) and depends on torch-scatter, torch-sparse, torch-cluster, and torch-spline-conv. Ensure that your PyTorch and CUDA versions are compatible with your environment.

Datasets

Prepared datasets are hosted on the Hugging Face Hub under alagesse/graph-alignment-benchmark-data.

  • Hugging Face dataset page: https://huggingface.co/datasets/alagesse/graph-alignment-benchmark-data

You can programmatically download a dataset by name using the package API:

from ngab import download_dataset

dataset_dir = download_dataset("zinc")  # returns a local path string
print(dataset_dir)

This fetches the required *.safetensors files for both the training and validation splits into a local directory (by default graph-alignment-benchmark-data/zinc). If the files already exist locally and are complete, no download occurs.

Generate your own datasets with ngab.random

Use ngab.random to synthesize graph pairs for alignment tasks, or build new datasets from existing graphs.

Available helpers:

  • erdos_renyi(nb_graphs, order, p, directed=False, self_loops=False)
  • bernoulli_corruption(graphs, noise, directed=False, self_loops=False, type="add"|"add_remove")
  • uniform_sub_sampling(graph, n, num_nodes)
  • bfs_sub_sampling(graph, n, num_nodes, p=1.0)

Example: build equal-size graph pairs suitable for alignment training:

import torch
import ngab
from ngab.random import erdos_renyi, bernoulli_corruption

# 1) Generate base graphs (Erdős–Rényi)
base_graphs = erdos_renyi(nb_graphs=10_000, order=64, p=0.08, directed=False, self_loops=False)

# 2) Corrupt them (add/remove edges) to form paired graphs
corrupted_graphs = bernoulli_corruption(base_graphs, noise=0.10, directed=False, self_loops=False, type="add_remove")

# Now (base_graphs[i], corrupted_graphs[i]) form an aligned pair with the same number of nodes.
# You can serialize pairs into `safetensors` using your own pipeline to match the `GADataset` file layout.

Example: derive many small graphs from a large graph:

from ngab.random import uniform_sub_sampling, bfs_sub_sampling

# Given a PyG Data object `big_graph`, create a batch of small induced subgraphs
uniform_batch = uniform_sub_sampling(big_graph, n=512, num_nodes=64)
bfs_batch = bfs_sub_sampling(big_graph, n=512, num_nodes=64, p=0.5)

Benchmarking and Metrics

During training, NGAB logs to Weights & Biases:

  • loss: alignment cross-entropy over the diagonal after masked softmax.
  • LAP: assignment accuracy from the Hungarian algorithm on softmax-normalized similarities.
  • top_k: mean recall at k for node assignments (top_1, top_3, top_5).

Quick training with TrainConfig

Use the built-in training loop to quickly benchmark or pretrain a custom model on any downloaded dataset.

import pathlib
import torch
import ngab
from ngab import TrainConfig, train_loop

# 1) Download dataset locally (from Hugging Face)
dataset_dir = ngab.download_dataset("zinc")

# 2) Define your model (any class from ngab.models or your own torch.nn.Module)
model = ngab.models.GCN(in_features=1, features=128, out_features=64, layers=4)

# 3) Build the training configuration
config = TrainConfig()
config.model = model
config.dataset = pathlib.Path(dataset_dir)
config.experiment = "graph-alignment-benchmark-zinc"
config.run_name = "gcn-zinc"
config.epochs = 50
config.batch_size = 128
config.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config.log_frequency = 5
config.profile = False
config.optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
config.scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(config.optimizer, T_max=config.epochs)
config.grad_clip = 1.0

# 4) Train
train_loop(config)

Available Models

Models are implemented with PyTorch Geometric and share a simple constructor interface; typical arguments include in_features, features, out_features, and layers. Available classes are exposed under ngab.models:

  • GCN, GAT, GATv2, GatedGCN, GIN, TAGCN, SGC, GraphGPS, PNA, PAN

Example:

import ngab
model = ngab.models.GCN(in_features=1, features=128, out_features=64, layers=4)

Links

  • GitHub repository: https://github.com/adrien-lagesse/graph-alignment-benchmark
  • Hugging Face dataset: https://huggingface.co/datasets/alagesse/graph-alignment-benchmark-data

License

MIT License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ngab-0.1.1.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ngab-0.1.1-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file ngab-0.1.1.tar.gz.

File metadata

  • Download URL: ngab-0.1.1.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.24

File hashes

Hashes for ngab-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4ec983ecbaa0d7d46a958757633762e3a6e65e3b3110e47df2d2737f016aaafd
MD5 22eda8ffc8fb7f2087703b11e2de1a7b
BLAKE2b-256 eb1728baa637e3ca4763810b5c7ca0cff8847e4dcba397e507253540fe2d5cd3

See more details on using hashes here.

File details

Details for the file ngab-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ngab-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.24

File hashes

Hashes for ngab-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5374b5c4426ec4826b387e69a59c5bfffc646ea54a7d09f8c442818811da18de
MD5 f6d42a7106fb4336166608ed02e8931a
BLAKE2b-256 985810e245aca73477080e94f2fa24405ce67a5c66e7ac436dd64b70bb60ef57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page