Skip to main content

A large-scale database and benchmark for graph learning.

Project description

GraphNetz

Statistically rigorous GNN benchmarking

Build Docs Python License Paper


Why GraphNetz

Whether you are proposing a new GNN architecture, testing a model on a new graph domain, or comparing existing methods across graph types, GraphNetz turns the usual “train, evaluate, table of accuracies” workflow into a reproducible statistical report. Instead of reporting point estimates alone, it provides confidence intervals for each result, paired model comparisons with multiple-testing correction, and rank-based summaries across datasets using critical-difference diagrams. The goal is not just to crown a leaderboard winner, but to give researchers a principled way to quantify uncertainty, compare methods fairly, and produce the exact evidence reviewers often ask for in graph-learning papers.

Most GNN benchmarks report point-estimate accuracies on a handful of citation graphs and declare a winner without confidence intervals, multiple-comparison correction, or rank aggregation across datasets. GraphNetz's default output is a structured statistical report, not a raw accuracy table:

  • multi-seed Student's t confidence intervals per cell,
  • Holm–Bonferroni paired t-tests (or Wilcoxon signed-rank) within each task,
  • Demšar critical-difference diagrams from Friedman ranks with a Nemenyi post-hoc.

The catalogue is organised along a category × task taxonomy:

  • 63 dataset loaders across 10 scientific categories
  • 4 task kinds (node classification, graph classification, graph regression, link prediction)
  • 5 canonical architectures (GCN, GAT, GIN, GraphSAGE, Graph Transformer) plug into every kind via a small set of task-kind adapters;

Install

uv add graphnetz
# or, in an existing environment:
pip install graphnetz

For local development:

git clone https://github.com/quant-sci/graphnetz
cd graphnetz
uv sync --group dev

GraphNetz requires Python ≥ 3.10, torch ≥ 2.6, and torch-geometric ≥ 2.6.

Quick start

from graphnetz import GCN, train_node_classification, plot_history
from graphnetz.datasets.social import cora

ds = cora("data/cora")
model = GCN(ds.num_features, 64, ds.num_classes)
history = train_node_classification(model, ds[0], epochs=200)
fig, ax = plot_history(history, title="GCN on Cora")

For a full benchmark run with the default statistical report:

from graphnetz import GAT, GCN, GraphSAGE, GraphTransformer, run_benchmark

report = run_benchmark(
    "social",
    {"GCN": GCN, "GAT": GAT, "GraphSAGE": GraphSAGE, "GraphTransformer": GraphTransformer},
    seeds=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
    kind="node_cls",          # restrict to one task family
)
print(report.summary())       # per-(task, model) mean ± t-CI
print(report.pairwise())      # Holm-corrected paired t-tests (or Wilcoxon)
fig, _ = report.plot_critical_difference(alpha=0.05)

Task kinds

Kind Symbol Metric Examples
Node classification node_cls test accuracy Cora, Roman-empire
Graph classification graph_cls val accuracy MUTAG, MNIST-superpixels
Graph regression graph_reg val MAE ZINC, QM9
Link prediction link_pred test AUC FB15k-237, Internet AS

Unlabelled graphs (Netzschleuder, synthetic combinatorial, Ising lattice) enter the benchmark through link prediction on a held-out edge split, so every cell carries a real test-time metric — there is no self-supervised pretext loss in the headline report.

Dataset categories

Category # Task kinds Loaders
Combinatorial 6 LP random TSP, VRP, max-flow, bipartite matching, coloring, max-cut
Biology 12 GC, GR, LP MUTAG, PROTEINS, ENZYMES, Peptides-func/struct, PPI, C. elegans, Budapest connectome, hospital/high-school contacts, ogbg-molhiv†, ogbg-molpcba†
Social 16 NC, LP Cora, CiteSeer, PubMed, WikiCS, Roman-empire, Amazon-ratings, Minesweeper, Tolokers, Questions, MovieLens-100k, Karate, Facebook friends, DBLP coauthor, DNC emails, ogbn-arxiv†, ogbl-collab†
Knowledge 3 LP FB15k-237, WordNet18-RR, WordNet (Netz)
Infrastructure 6 LP power grid, EuroRoad, US roads, EU airlines, London transport, urban streets
Finance 5 NC, LP Elliptic Bitcoin, product space, board of directors, US patents, ogbn-products†
Computing 4 LP Internet AS, Internet topology, AS-Skitter, route views
Vision 5 GC, NC MNIST/CIFAR-10 superpixels, ModelNet10/40, ShapeNet
Physics 3 GR, LP QM9, ZINC, Ising lattice
Security 3 GC, LP MalNet-Tiny, 9/11 terrorists, train terrorists

† Requires the optional ogb extra (pip install graphnetz[ogb]). The five OGB loaders are folded into their domain categories rather than exposed as a separate ogb category, so they appear in run_benchmark(category, ...) alongside the curated built-ins.

from graphnetz.datasets.social import cora, roman_empire
from graphnetz.datasets.biology import peptides_func
from graphnetz.datasets.computing import internet_as

# Optional OGB loaders live in their domain modules (require `pip install graphnetz[ogb]`):
from graphnetz.datasets.social import ogbn_arxiv     # node_cls
from graphnetz.datasets.biology import ogbg_molhiv   # graph_cls

ds_cora = cora("data/cora")
ds_rom  = roman_empire("data/roman_empire")        # heterophilic
ds_pep  = peptides_func("data/peptides_func")      # LRGB
ds_inet = internet_as("data/internet_as")          # Netzschleuder

For arbitrary Netzschleuder networks:

from graphnetz import Netz
ds = Netz(root="data", dataset_name="urban_streets", network_name="brasilia")

Models

Model Kinds Source
GCN all four Kipf & Welling, ICLR 2017
GAT all four Veličković et al., ICLR 2018
GIN graph_cls, graph_reg Xu et al., ICLR 2019
GraphSAGE all four Hamilton et al., NeurIPS 2017
GraphTransformer all four Shi et al., 2021
DGI (utility) Veličković et al., ICLR 2019

Node-level encoders enter every task kind through three small adapters: graph-level pooling head, dot-product link-prediction head, and the DGI self-supervised wrapper for optional unsupervised pre-training.

Custom models

from graphnetz import register_model

# 1. Decorator
@register_model(kinds="node_cls")
class MyGNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels): ...

# 2. Class attribute (no decorator)
class MyGNN(torch.nn.Module):
    task_kinds = {"node_cls", "graph_cls"}

# 3. Inline tuple at run-time
run_benchmark(
    "social",
    {"MyGNN": (MyGNN, "node_cls",
               lambda i, h, o: MyGNN(i, h, o, dropout=0.3))},
)

The statistical report

run_benchmark(...) returns a BenchmarkReport with the following methods:

Method Output
report.summary(ci=0.95) per-(task, model) mean ± t-CI half-width DataFrame
report.pairwise(alpha=0.05) Holm-corrected paired t-tests or Wilcoxon signed-rank tests within each task
report.plot_critical_difference() Demšar / Nemenyi CD diagram across tasks
report.plot_pairwise(layout=...) matrix or list view of pairwise significance
report.plot_forest() per-task forest plot of mean ± CI
report.plot_learning_curves() shared-y learning curves with t-CI bands
report.to_latex(path) publication-ready bold-best LaTeX table
report.pairwise_to_latex(path) Holm pairwise LaTeX table (parametric or non-parametric)

Notebooks

Worked examples live under examples/:

  • 01_benchmark.ipynb — the cross-category dashboard (multi-seed report, bootstrap CIs, custom-model integration).
  • 02_knowledge.ipynb — relational link prediction on FB15k-237 / WN18-RR using the DistMult decoder.

Reproducing the paper

PYTHONPATH=src uv run python paper/experiment.py   # train + cache + figures
latexmk -pdf paper/main.tex                        # compile PDF

The script trains 5 architectures × 10 seeds across the 10 surviving categories, caches the histories under paper/_cache_*.pkl, and writes every figure (paper/figures/) and LaTeX table (paper/tables/) referenced by paper/main.tex. Total runtime on a recent laptop CPU is under 30 minutes.

Contributing

Pull requests welcome. Read CONTRIBUTING.md first — the short version is: every benchmark cell must carry a real held-out metric, every change must thread through the multi-seed pipeline, and every PR must be ruff clean.

uv run pytest
uv run ruff check

License

MIT — see LICENCE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphnetz-0.1.1.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphnetz-0.1.1-py3-none-any.whl (59.9 kB view details)

Uploaded Python 3

File details

Details for the file graphnetz-0.1.1.tar.gz.

File metadata

  • Download URL: graphnetz-0.1.1.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graphnetz-0.1.1.tar.gz
Algorithm Hash digest
SHA256 aef68d97cece9e99581c29533b8f20f601e486cd9170f714cd85c544809e3ed7
MD5 cc9a464714705e14691f95b3c8b06ac4
BLAKE2b-256 9dd4a2d90254fcff00050d1946b509c0817b7948394452c5728491757068ae25

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphnetz-0.1.1.tar.gz:

Publisher: release.yaml on quant-sci/graphnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file graphnetz-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: graphnetz-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 59.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graphnetz-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6439b83b8cc8b09e8326d7b9856fbb1f5d9cd17da8a9b6d74543baa93d98ef65
MD5 901597bd6d8752229f9fd691e8eda41e
BLAKE2b-256 4cc6fb65e2fa3e9a9a4121c3d807ecd043423caca632767223e663df2bb150ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphnetz-0.1.1-py3-none-any.whl:

Publisher: release.yaml on quant-sci/graphnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page