Evaluation benchmarks for graph generative models
Project description
PolyGraph is a Python library for evaluating graph generative models by providing standardized datasets and metrics (including PolyGraph Discrepancy).
PolyGraph discrepancy is a new metric we introduced, which provides the following advantages over maxmimum mean discrepancy (MMD):
| Property | MMD | PGD |
|---|---|---|
| Range | [0, ∞) | [0, 1] |
| Intrinsic Scale | ❌ | ✅ |
| Descriptor Comparison | ❌ | ✅ |
| Multi-Descriptor Aggregation | ❌ | ✅ |
| Single Ranking | ❌ | ✅ |
It also provides a number of other advantages over MMD which we discuss in our paper.
Installation
pip install polygraph-benchmark
No manual compilation of ORCA is required. For details on interaction with graph_tool, see the more detailed installation instructions in the docs.
If you'd like to use SBM graph dataset validation with graph tools, use a mamba or pixi environment. More information is available in the documentation.
At a glance
Here are a set of datasets and metrics this library provides:
- 🗂️ Datasets: ready-to-use splits for procedural and real-world graphs
- Procedural datasets:
PlanarLGraphDataset,SBMLGraphDataset,LobsterLGraphDataset - Real-world:
QM9,MOSES,Guacamol,DobsonDoigGraphDataset,ModelNet10GraphDataset - Also:
EgoGraphDataset,PointCloudGraphDataset
- Procedural datasets:
- 📊 Metrics: unified, fit-once/compute-many interface with convenience wrappers, avoiding redundant computations.
- MMD2:
GaussianTVMMD2Benchmark,RBFMMD2Benchmark - Kernel hyperparameter optimization with
MaxDescriptorMMD2. - PolyGraphDiscrepancy:
StandardPGD,MolecularPGD(for molecule descriptors). - Validation/Uniqueness/Novelty:
VUN. - Uncertainty quantification for benchmarking (
GaussianTVMMD2BenchmarkInterval,RBFMMD2Benchmark,PGD5Interval)
- MMD2:
- 🧩 Extendable: Users can instantiate custom metrics by specifying descriptors, kernels, or classifiers (
PolyGraphDiscrepancy,DescriptorMMD2). PolyGraph defines all necessary interfaces but imposes no requirements on the data type of graph objects. - ⚙️ Interoperability: Works on Apple Silicon Macs and Linux.
- ✅ Tested, type checked and documented
⚠️ Important - Dataset Usage Warning
To help reproduce previous results, we provide the following datasets:
PlanarGraphDatasetSBMGraphDatasetLobsterGraphDataset
But they should not be used for benchmarking, due to unreliable metric estimates (see our paper for more details).
We provide larger datasets that should be used instead:
PlanarLGraphDatasetSBMLGraphDatasetLobsterLGraphDataset
Tutorial
Our demo script showcases some features of our library in action.
Datasets
Instantiate a benchmark dataset as follows:
import networkx as nx
from polygraph.datasets import PlanarGraphDataset
reference = PlanarGraphDataset("test").to_nx()
# Let's also generate some graphs coming from another distribution.
generated = [nx.erdos_renyi_graph(64, 0.1) for _ in range(40)]
Metrics
Maximum Mean Discrepancy
To compute existing MMD2 formulations (e.g. based on the TV pseudokernel), one can use the following:
from polygraph.metrics import GaussianTVMMD2Benchmark # Can also be RBFMMD2Benchmark
gtv_benchmark = GaussianTVMMD2Benchmark(reference)
print(gtv_benchmark.compute(generated)) # {'orbit': ..., 'clustering': ..., 'degree': ..., 'spectral': ...}
PolyGraphDiscrepancy
Similarly, you can compute our proposed PolyGraphDiscrepancy, like so:
from polygraph.metrics import StandardPGD
pgd = StandardPGD(reference)
print(pgd.compute(generated)) # {'pgd': ..., 'pgd_descriptor': ..., 'subscores': {'orbit': ..., }}
pgd_descriptor provides the best descriptor used to report the final score.
Validity, uniqueness and novelty
VUN values follow a similar interface:
from polygraph.metrics import VUN
reference_ds = PlanarGraphDataset("test")
pgd = VUN(reference, validity_fn=reference_ds.is_valid, confidence_level=0.95) # if applicable, validity functions are defined as a dataset attribute
print(pgd.compute(generated)) # {'valid': ..., 'valid_unique_novel': ..., 'valid_novel': ..., 'valid_unique': ...}
Metric uncertainty quantification
For MMD and PGD, uncertainty quantifiation for the metrics are obtained through subsampling. For VUN, a confidence interval is obtained with a binomial test.
For VUN, the results can be obtained by specifying a confidence level when instantiating the metric.
For the others, the Interval suffix references the class that implements subsampling.
from polygraph.metrics import GaussianTVMMD2BenchmarkInterval, RBFMMD2BenchmarkInterval, StandardPGDInterval
from tqdm import tqdm
metrics = [
GaussianTVMMD2BenchmarkInterval(reference, subsample_size=8, num_samples=10), # specify size of each subsample, and the number of samples
RBFMMD2BenchmarkInterval(reference, subsample_size=8, num_samples=10),
StandardPGDInterval(reference, subsample_size=8, num_samples=10)
]
for metric in tqdm(metrics):
metric_results = metric.compute(
generated,
)
Example Benchmark
The following results mirror the tables from our paper. Bold indicates best, and underlined indicates second-best. Values are multiplied by 100 for legibility. Standard deviations are obtained with subsampling using StandardPGDInterval and MoleculePGDInterval. Specific parameters are discussed in the paper.
| Method | Planar-L | Lobster-L | SBM-L | Proteins | Guacamol | Moses |
|---|---|---|---|---|---|---|
| AutoGraph | 34.0 ± 1.8 | 18.0 ± 1.6 | 5.6 ± 1.5 | 67.7 ± 7.4 | 22.9 ± 0.5 | 29.6 ± 0.4 |
| AutoGraph* | — | — | — | — | 10.4 ± 1.2 | — |
| DiGress | 45.2 ± 1.8 | 3.2 ± 2.6 | 17.4 ± 2.3 | 88.1 ± 3.1 | 32.7 ± 0.5 | 33.4 ± 0.5 |
| GRAN | 99.7 ± 0.2 | 85.4 ± 0.5 | 69.1 ± 1.4 | 89.7 ± 2.7 | — | — |
| ESGG | 45.0 ± 1.4 | 69.9 ± 0.6 | 99.4 ± 0.2 | 79.2 ± 4.3 | — | — |
* AutoGraph* denotes a variant that leverages additional training heuristics as described in the paper.
Citing
To cite our paper:
@misc{krimmel2025polygraph,
title={PolyGraph Discrepancy: a classifier-based metric for graph generation},
author={Markus Krimmel and Philip Hartout and Karsten Borgwardt and Dexiong Chen},
year={2025},
eprint={2510.06122},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.06122},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polygraph_benchmark-1.0.2.tar.gz.
File metadata
- Download URL: polygraph_benchmark-1.0.2.tar.gz
- Upload date:
- Size: 80.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d3515ba69ab17586559273f35d8562bb4b0616cdb7108d006dcc8d81b8e5362
|
|
| MD5 |
e4a3f7ee9a833d48f3c1cc65ae487d5a
|
|
| BLAKE2b-256 |
31769372042d8f0ece78acf7b5c09f826275400439a0d0ba52cfe31442bef8fc
|
Provenance
The following attestation bundles were made for polygraph_benchmark-1.0.2.tar.gz:
Publisher:
build-and-publish.yaml on BorgwardtLab/polygraph-benchmark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polygraph_benchmark-1.0.2.tar.gz -
Subject digest:
9d3515ba69ab17586559273f35d8562bb4b0616cdb7108d006dcc8d81b8e5362 - Sigstore transparency entry: 600941351
- Sigstore integration time:
-
Permalink:
BorgwardtLab/polygraph-benchmark@677cc8f3fc0464b0217dc37cb1be93df7a9ce63c -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/BorgwardtLab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-and-publish.yaml@677cc8f3fc0464b0217dc37cb1be93df7a9ce63c -
Trigger Event:
release
-
Statement type:
File details
Details for the file polygraph_benchmark-1.0.2-py3-none-any.whl.
File metadata
- Download URL: polygraph_benchmark-1.0.2-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a4d26dd951f6bc4853a8da0a2a8737d913e4c6413da9b860f6c0d13b63e762b
|
|
| MD5 |
06ed9c100df7f8215f278e0d79d8ab67
|
|
| BLAKE2b-256 |
90626e6a9ef60d4d8507681198d929de8104af270787a1dd19ecc885a632643d
|
Provenance
The following attestation bundles were made for polygraph_benchmark-1.0.2-py3-none-any.whl:
Publisher:
build-and-publish.yaml on BorgwardtLab/polygraph-benchmark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polygraph_benchmark-1.0.2-py3-none-any.whl -
Subject digest:
7a4d26dd951f6bc4853a8da0a2a8737d913e4c6413da9b860f6c0d13b63e762b - Sigstore transparency entry: 600941352
- Sigstore integration time:
-
Permalink:
BorgwardtLab/polygraph-benchmark@677cc8f3fc0464b0217dc37cb1be93df7a9ce63c -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/BorgwardtLab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-and-publish.yaml@677cc8f3fc0464b0217dc37cb1be93df7a9ce63c -
Trigger Event:
release
-
Statement type: