Skip to main content

Evaluation benchmarks for graph generative models

Project description

PolyGraph icon
PolyGraph logo

PolyGraph is a Python library for evaluating graph generative models by providing standardized datasets and metrics (including PolyGraph Discrepancy).

PolyGraph discrepancy is a new metric we introduced, which provides the following advantages over maxmimum mean discrepancy (MMD):

Property MMD PGD
Range [0, ∞) [0, 1]
Intrinsic Scale
Descriptor Comparison
Multi-Descriptor Aggregation
Single Ranking

It also provides a number of other advantages over MMD which we discuss in our paper.

Installation

pip install polygraph-benchmark

No manual compilation of ORCA is required. For details on interaction with graph_tool, see the more detailed installation instructions in the docs.

If you'd like to use SBM graph dataset validation with graph tools, use a mamba or pixi environment. More information is available in the documentation.

At a glance

Here are a set of datasets and metrics this library provides:

  • 🗂️ Datasets: ready-to-use splits for procedural and real-world graphs
    • Procedural datasets: PlanarLGraphDataset, SBMLGraphDataset, LobsterLGraphDataset
    • Real-world: QM9, MOSES, Guacamol, DobsonDoigGraphDataset, ModelNet10GraphDataset
    • Also: EgoGraphDataset, PointCloudGraphDataset
  • 📊 Metrics: unified, fit-once/compute-many interface with convenience wrappers, avoiding redundant computations.
    • MMD2: GaussianTVMMD2Benchmark, RBFMMD2Benchmark
    • Kernel hyperparameter optimization with MaxDescriptorMMD2.
    • PolyGraphDiscrepancy: StandardPGD, MolecularPGD (for molecule descriptors).
    • Validation/Uniqueness/Novelty: VUN.
    • Uncertainty quantification for benchmarking (GaussianTVMMD2BenchmarkInterval, RBFMMD2Benchmark, PGD5Interval)
  • 🧩 Extendable: Users can instantiate custom metrics by specifying descriptors, kernels, or classifiers (PolyGraphDiscrepancy, DescriptorMMD2). PolyGraph defines all necessary interfaces but imposes no requirements on the data type of graph objects.
  • ⚙️ Interoperability: Works on Apple Silicon Macs and Linux.
  • Tested, type checked and documented
⚠️ Important - Dataset Usage Warning

To help reproduce previous results, we provide the following datasets:

  • PlanarGraphDataset
  • SBMGraphDataset
  • LobsterGraphDataset

But they should not be used for benchmarking, due to unreliable metric estimates (see our paper for more details).

We provide larger datasets that should be used instead:

  • PlanarLGraphDataset
  • SBMLGraphDataset
  • LobsterLGraphDataset

Tutorial

Our demo script showcases some features of our library in action.

Datasets

Instantiate a benchmark dataset as follows:

import networkx as nx
from polygraph.datasets import PlanarGraphDataset

reference = PlanarGraphDataset("test").to_nx()

# Let's also generate some graphs coming from another distribution.
generated = [nx.erdos_renyi_graph(64, 0.1) for _ in range(40)]

Metrics

Maximum Mean Discrepancy

To compute existing MMD2 formulations (e.g. based on the TV pseudokernel), one can use the following:

from polygraph.metrics import GaussianTVMMD2Benchmark # Can also be RBFMMD2Benchmark

gtv_benchmark = GaussianTVMMD2Benchmark(reference)

print(gtv_benchmark.compute(generated))  # {'orbit': ..., 'clustering': ..., 'degree': ..., 'spectral': ...}

PolyGraphDiscrepancy

Similarly, you can compute our proposed PolyGraphDiscrepancy, like so:

from polygraph.metrics import StandardPGD

pgd = StandardPGD(reference)
print(pgd.compute(generated)) # {'pgd': ..., 'pgd_descriptor': ..., 'subscores': {'orbit': ..., }}

pgd_descriptor provides the best descriptor used to report the final score.

Validity, uniqueness and novelty

VUN values follow a similar interface:

from polygraph.metrics import VUN
reference_ds = PlanarGraphDataset("test")
pgd = VUN(reference, validity_fn=reference_ds.is_valid, confidence_level=0.95) # if applicable, validity functions are defined as a dataset attribute
print(pgd.compute(generated))  # {'valid': ..., 'valid_unique_novel': ..., 'valid_novel': ..., 'valid_unique': ...}

Metric uncertainty quantification

For MMD and PGD, uncertainty quantifiation for the metrics are obtained through subsampling. For VUN, a confidence interval is obtained with a binomial test.

For VUN, the results can be obtained by specifying a confidence level when instantiating the metric.

For the others, the Interval suffix references the class that implements subsampling.

from polygraph.metrics import GaussianTVMMD2BenchmarkInterval, RBFMMD2BenchmarkInterval, StandardPGDInterval
from tqdm import tqdm

metrics = [
  GaussianTVMMD2BenchmarkInterval(reference, subsample_size=8, num_samples=10), # specify size of each subsample, and the number of samples
  RBFMMD2BenchmarkInterval(reference, subsample_size=8, num_samples=10),
  StandardPGDInterval(reference, subsample_size=8, num_samples=10)
]

for metric in tqdm(metrics):
	metric_results = metric.compute(
    generated,
  )

Example Benchmark

The following results mirror the tables from our paper. Bold indicates best, and underlined indicates second-best. Values are multiplied by 100 for legibility. Standard deviations are obtained with subsampling using StandardPGDInterval and MoleculePGDInterval. Specific parameters are discussed in the paper.

Method Planar-L Lobster-L SBM-L Proteins Guacamol Moses
AutoGraph 34.0 ± 1.8 18.0 ± 1.6 5.6 ± 1.5 67.7 ± 7.4 22.9 ± 0.5 29.6 ± 0.4
AutoGraph* 10.4 ± 1.2
DiGress 45.2 ± 1.8 3.2 ± 2.6 17.4 ± 2.3 88.1 ± 3.1 32.7 ± 0.5 33.4 ± 0.5
GRAN 99.7 ± 0.2 85.4 ± 0.5 69.1 ± 1.4 89.7 ± 2.7
ESGG 45.0 ± 1.4 69.9 ± 0.6 99.4 ± 0.2 79.2 ± 4.3

* AutoGraph* denotes a variant that leverages additional training heuristics as described in the paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polygraph_benchmark-1.0.1.tar.gz (80.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polygraph_benchmark-1.0.1-py3-none-any.whl (80.8 kB view details)

Uploaded Python 3

File details

Details for the file polygraph_benchmark-1.0.1.tar.gz.

File metadata

  • Download URL: polygraph_benchmark-1.0.1.tar.gz
  • Upload date:
  • Size: 80.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polygraph_benchmark-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b4c2721c65e27c7baef91287825abae2d7bd995e0cb68e0fadb8e3ed30af42fd
MD5 1026530edba654db1a73c81eec6adc67
BLAKE2b-256 2d3027d9cbb63b702ccd10056dbc746408069c46c3ea36b225b8e79085ffad5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for polygraph_benchmark-1.0.1.tar.gz:

Publisher: build-and-publish.yaml on BorgwardtLab/polygraph-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polygraph_benchmark-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for polygraph_benchmark-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1dae3a710c1bc40acbbffc0328a9f47cd3099f0d1fb319d213060f288919f494
MD5 ff93a03337e02c6746fbe841485d9f63
BLAKE2b-256 c9998fc4ac2d0f6a1c7c86de5b03dad6798e5b3b152eb2cbb72dcc92940fdac3

See more details on using hashes here.

Provenance

The following attestation bundles were made for polygraph_benchmark-1.0.1-py3-none-any.whl:

Publisher: build-and-publish.yaml on BorgwardtLab/polygraph-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page