Skip to main content

PyDGC: A Deep Graph Clustering Benchmark

Project description

PyDGC

PyDGC, a flexible and extensible Python library for deep graph clustering (DGC), is compatible with frameworks such as PyG and OGB. It supports the easy integration of new models and datasets, facilitating the rapid development, reproduction, and fair comparison of DGC methods.

News

  • 2025.05: Release source code of PyDGC.

What is DGC?

Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different groups, has attracted intensive attention in recent years.

More details can be found in the survey paper. Please click here to view the comprehensive archive of papers.

Timeline of representative models.

DGCBench

DGCBench encompasses 12 diverse datasets with different characteristics and 12 state-of-the-art methods from all major paradigms. By integrating them into a standardized pipeline, we ensure fair, reproducible, and comprehensive evaluations across multiple dimensions.

Features

  • Integration of multiple deep graph clustering models. Supported Models
  • Support for various graph datasets from PyG and OGB. Supported Datasets
  • Model evaluation and visualization capabilities.
  • Standardized Pipeline.

Overview of Pipeline

Installation

  • Install with Pip

    coming soon...

  • Installation for local development

    git clone https://github.com/Marigoldwu/PyDGC.git
    cd PyDGC
    pip install -e .
    

Examples

Reproduce built-in models

Take GAE as an example:

cd PyDGC/example/pipelines/gae
python run.py

You can also specify arguments in the command line:

python run.py --dataset_name CORA -eval_each

Other optional arguments:

--cfg_file_path YourPath  # path of corresponding configurations file
--flag FlagContent  # Descriptions
--drop_edge float  # probability of dropping edges
--drop_feature float  # probability of dropping features
--add_edge float  # probability of adding edges
--add_noise float  # standard deviation of Gaussian Noise
-pretrain  # only run the pretraining stage in the model

Develop your own DGC model

from pydgc.models import DGCModel

class MyModel(DGCModel):
    def __init__(self, logger, cfg):
        super(MyModel).__init__(logger, cfg)
        your_model = ...  # Your model
        
        self.loss_curve = []
        self.nmi_curve = []
        self.best_embedding = None
        self.best_predicted_labels = None
        self.best_results = {'ACC': -1}
    
    def forward(self, data):
        ...  # forward process
        return something
    # If needed
    def loss(self, *args, **kwargs):
    # If needed
    def pretrain(self, data, cfg, flag):
    
    def train_model(self, data, cfg, flag):
    
    def get_embedding(self, data):
    
    def clustering(self, data):
        embedding = self.get_embedding(data)
        # clustering
        return embedding, labels_, clustering_centers
    
    def evaluate(self, data):
        embedding, predicted_labels, clustering_centers = self.clustering(data)
        ground_truth = data.y.numpy()
        metric = DGCMetric(ground_truth, predicted_labels.numpy(), embedding, data.edge_index)
        results = metric.evaluate_one_epoch(self.logger, self.cfg.evaluate)
        return embedding, predicted_labels, results

Develop your own DGC pipeline

from pydgc.pipelines import BasePipeline
from pydgc.utils import perturb_data
import MyModel  # import your own model

class MyPipeline(BasePipeline):
    def __init__(self, args):
        super(MyPipeline).__init__(args)
    
    def augmentation(self):
        self.data = perturb_data(self.data, self.cfg.dataset.augmentation)
        # other augmentations if needed
        
    def build_model(self):
        model = MyModel(self.logger, self.cfg)
        self.logger.model_info(model)
        return model

Supported Models

No. Model Paper Source Code
1 GAE Variational Graph Auto-Encoders code
2 GAE_SSC - -
3 DAEGC Attributed graph clustering: A deep attentional embedding approach code
4 SDCN Structural Deep Clustering Network code
5 DFCN Deep Fusion Clustering Network code
6 DCRN Deep Graph Clustering via Dual Correlation Reduction code
7 AGC-DRR Attributed Graph Clustering with Dual Redundancy Reduction code
8 DGCluster DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization code
9 HSAN Hard Sample Aware Network for Contrastive Deep Graph Clustering code
10 CCGC Cluster-guided Contrastive Graph Clustering Network code
11 MAGI Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective code
12 NS4GC Reliable Node Similarity Matrix Guided Contrastive Graph Clustering code

Supported Datasets

No. Dataset #Samples #Features #Edges #Classes Homo. Ratio
1 Wiki 2,405 4,973 17,981 17 0.71
2 Cora 2,708 1,433 5,429 7 0.81
3 ACM 3,025 1,870 13,128 3 0.82
4 Citeseer 3,327 3,703 9,104 6 0.74
5 DBLP 4,057 334 3,528 4 0.80
6 PubMed 19,717 500 88,648 3 0.80
7 Ogbn-arXiv 169,343 128 2,315,598 40 0.65
8 USPS(3NN) 9,298 256 27,894 10 0.98
9 HHAR(3NN) 10,299 561 30,897 6 0.95
10 BlogCatalog 5,196 8,189 343,486 6 0.40
11 Flickr 7,575 12,047 479,476 9 0.24
12 Roman-empire 22,662 300 65,854 18 0.05

More Datasets will be introduced.

Citation

Related Repositories

ADGC: Awesome-Deep-Graph-Clustering

Older version of this repository: A-Unified-Framework-for-Attribute-Graph-Clustering

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydgc-1.0.1.tar.gz (64.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydgc-1.0.1-py3-none-any.whl (93.9 kB view details)

Uploaded Python 3

File details

Details for the file pydgc-1.0.1.tar.gz.

File metadata

  • Download URL: pydgc-1.0.1.tar.gz
  • Upload date:
  • Size: 64.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pydgc-1.0.1.tar.gz
Algorithm Hash digest
SHA256 946b841010a436c807bf09e9391f646f27b0aa54fd7b120072fc5b21bb8f6cb1
MD5 a154c826d4cb1c19c4719d5b6cffff5f
BLAKE2b-256 7a6f2c021a6d56e563e1eb1229070f5b276cce8249a3a31ce2bf47d4490d6f67

See more details on using hashes here.

File details

Details for the file pydgc-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pydgc-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 93.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pydgc-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 808245f5c0c7f4215fda0a291d8e6e258b10681093b3d63fe78320f843b8c865
MD5 9b102091f25fa3b99e151c8d0fb99187
BLAKE2b-256 ff9acd03d01ae1e0b44bba240b2bcdb0994cd3107d37da2ef7413e69b6b59672

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page