Skip to main content

Standardized benchmarks for evaluating synthetic graph generation methods

Project description

Synthetic Graph Benchmarks

PyPI version Python 3.10+ License: MIT

A Python package implementing standardized benchmarks for evaluating synthetic graph generation methods, based on the evaluation frameworks introduced in:

This package provides a unified interface for benchmarking graph generation algorithms against established datasets and metrics used in the graph generation literature.

Features

  • Standardized Datasets: Access to benchmark datasets including Stochastic Block Model (SBM), Planar graphs, and Tree graphs
  • Comprehensive Metrics: Implementation of key evaluation metrics including:
    • Degree distribution comparison (MMD)
    • Clustering coefficient analysis
    • Orbit count statistics (using ORCA)
    • Spectral properties analysis
    • Wavelet coefficient comparison
  • Validation Metrics: Graph-type specific validation (planarity, tree properties, SBM likelihood)
  • Reproducible Evaluation: Consistent benchmarking across different graph generation methods
  • Easy Integration: Simple API for evaluating your own graph generation algorithms

Installation

From PyPI (recommended)

pip install synthetic-graph-benchmarks

From Source

git clone https://github.com/peteole/synthetic_graph_benchmarks.git
cd synthetic_graph_benchmarks
pip install -e .

Quick Start

import networkx as nx
from synthetic_graph_benchmarks import (
    benchmark_planar_results,
    benchmark_sbm_results, 
    benchmark_tree_results
)

# Generate some example graphs (replace with your graph generation method)
generated_graphs = [nx.erdos_renyi_graph(64, 0.1) for _ in range(20)]

# Benchmark against planar graph dataset
results = benchmark_planar_results(generated_graphs)
print(f"Planar accuracy: {results['planar_acc']:.3f}")
print(f"Average metric ratio: {results['average_ratio']:.3f}")

# Benchmark against SBM dataset  
sbm_results = benchmark_sbm_results(generated_graphs)
print(f"SBM accuracy: {sbm_results['sbm_acc']:.3f}")

# Benchmark against tree dataset
tree_results = benchmark_tree_results(generated_graphs)
print(f"Tree accuracy: {tree_results['planar_acc']:.3f}")

Datasets

The package provides access to three standard benchmark datasets:

Stochastic Block Model (SBM)

  • Size: 200 graphs
  • Properties: 2-5 communities, 20-40 nodes per community
  • Edge probabilities: 0.3 intra-community, 0.05 inter-community

Planar Graphs

  • Size: 200 graphs with 64 nodes each
  • Generation: Delaunay triangulation on random points in unit square
  • Properties: Guaranteed planarity

Tree Graphs

  • Size: 200 graphs with 64 nodes each
  • Properties: Connected acyclic graphs (trees)

Evaluation Metrics

Graph Statistics

  • Degree Distribution: Maximum Mean Discrepancy (MMD) between degree histograms
  • Clustering Coefficient: Local clustering coefficient comparison
  • Orbit Counts: 4-node orbit statistics using ORCA package
  • Spectral Properties: Laplacian eigenvalue distribution analysis
  • Wavelet Coefficients: Graph wavelet signature comparison

Validity Metrics

  • Planar Accuracy: Fraction of generated graphs that are planar
  • Tree Accuracy: Fraction of generated graphs that are trees (acyclic)
  • SBM Accuracy: Likelihood of graphs under fitted SBM parameters

Quality Scores

  • Uniqueness: Fraction of non-isomorphic graphs in generated set
  • Novelty: Fraction of generated graphs not isomorphic to training graphs
  • Validity-Uniqueness-Novelty (VUN): Combined score for overall quality

Advanced Usage

Custom Evaluation

from synthetic_graph_benchmarks.dataset import Dataset
from synthetic_graph_benchmarks.spectre_utils import PlanarSamplingMetrics

# Load dataset manually
dataset = Dataset.load_planar()
print(f"Training graphs: {len(dataset.train_graphs)}")
print(f"Validation graphs: {len(dataset.val_graphs)}")

# Use metrics directly
metrics = PlanarSamplingMetrics(dataset)
test_metrics = metrics.forward(dataset.train_graphs, test=True)
results = metrics.forward(generated_graphs, ref_metrics={"test": test_metrics}, test=True)

Accessing Individual Metrics

# Get detailed breakdown of all metrics
results = benchmark_planar_results(generated_graphs)

# Individual metric values
print(f"Degree MMD: {results['degree']:.6f}")
print(f"Clustering MMD: {results['clustering']:.6f}")  
print(f"Orbit MMD: {results['orbit']:.6f}")
print(f"Spectral MMD: {results['spectre']:.6f}")
print(f"Wavelet MMD: {results['wavelet']:.6f}")

# Ratios compared to training set
print(f"Degree ratio: {results['degree_ratio']:.3f}")
print(f"Average ratio: {results['average_ratio']:.3f}")

Citing

If you use this package in your research, please cite the original papers:

@inproceedings{martinkus2022spectre,
  title={SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators},
  author={Martinkus, Karolis and Loukas, Andreas and Perraudin, Nathanaël and Wattenhofer, Roger},
  booktitle={International Conference on Machine Learning},
  pages={15159--15202},
  year={2022},
  organization={PMLR}
}

@article{bergmeister2023efficient,
  title={Efficient and Scalable Graph Generation through Iterative Local Expansion},
  author={Bergmeister, Andreas and Martinkus, Karolis and Perraudin, Nathanaël and Wattenhofer, Roger},
  journal={arXiv preprint arXiv:2312.11529},
  year={2023}
}

Dependencies

  • Python ≥ 3.10
  • NetworkX ≥ 3.4.2
  • NumPy ≥ 2.2.6
  • SciPy ≥ 1.15.3
  • PyGSP ≥ 0.5.1
  • scikit-learn ≥ 1.7.1
  • ORCA-graphlets ≥ 0.1.4
  • PyTorch ≥ 2.3.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This package is based on evaluation frameworks developed by:

  • Karolis Martinkus (SPECTRE paper)
  • Andreas Bergmeister (Iterative Local Expansion paper)
  • The original GRAN evaluation codebase
  • NetworkX and PyGSP communities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_graph_benchmarks-0.1.2.tar.gz (88.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_graph_benchmarks-0.1.2-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_graph_benchmarks-0.1.2.tar.gz.

File metadata

File hashes

Hashes for synthetic_graph_benchmarks-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5b8565a3fa4fba82b4ef42f58623ff8a5b845388a79eeaf64d0a7715bd912aad
MD5 654a13d1ca28b4c49437d89c2813ea1a
BLAKE2b-256 03f874a3c3cd738f758c18fbb0eb95ece8f4edb2513e8ff5164280689d961941

See more details on using hashes here.

File details

Details for the file synthetic_graph_benchmarks-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for synthetic_graph_benchmarks-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4a44ecaf9b07f25906d138bb317eecf51244f53c48be2bbd80227cb4a604d1c4
MD5 80447090434e7d4a2273166efa3f72f3
BLAKE2b-256 899e5596d5c336e440d534ed7ce039317e453e0f2cc96a266c888f0000e06469

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page