Skip to main content

Benchmark of Graph Clustering.

Project description

EAGLEGraphClustering

PYPI

A benchmark of graph clustering from EAGLE-Lab, Zhejiang University.

Installation

  • python>=3.8
  • torch>=1.12
  • dgl>=1.1
$ python -m pip install egc

Usage

Pip Package

See egc for docs.

  • Import the package and use any graph clustering model supported as:
from torch import nn
from egc.model import DGL_GAEKmeans
from egc.utils import load_data
from egc.utils import get_default_args
from egc.utils import set_seed
from egc.utils import set_device

# set the random seed
set_seed(4096)

# set the gpu id
set_device('0')

# load graph
graph, label, n_clusters = load_data(
    dataset_name='Cora',
    directory='./data',
)
features = graph.ndata["feat"]
adj_csr = graph.adj_external(scipy_fmt="csr")

# get default args
args = get_default_args('gae_kmeans')

# init the model
model = DGL_GAEKmeans(
    epochs=args["epochs"],
    n_clusters=10,
    fead_dim=features.shape[1],
    n_nodes=features.shape[0],
    hidden_dim1=args["hidden1"],
    dropout=args["dropout"],
    lr=args["lr"],
    early_stop=args["early_stopping_epoch"],
    activation=args["activation"],
)

# fit the model
model.fit(adj_csr, features)

# get clustering results
res = model.get_memberships()

Command line

  • Clone the Repo
  • Install the env
# NOTE: python>=3.8 is needed
# Install cuda if necessary. Check Cuda version first.
$ cd EGC
# Leave out `bash .ci/install-dev.sh &&` if no dev env is needed.
$ bash .ci/install-dev.sh && bash .ci/install.sh
# run `source .env/bin/activate` to activate the virtual env
  • Run any supported model as:
$ python train.py ${OPTIONAL global args} ${POSITIONAL args (model)} ${optional model args}
  • OPTIONAL global args which should be used before ${model}
  • POSITIONAL args, i.e., all models supported

E.g.,

# check OPTIONAL global args, e.g., all models supported
$ python train.py -h

# check optional args of certain model
$ python train.py gae_kmeans -h

# run any model
$ python train.py --dataset=Cora gae_kmeans --lr 0.001

Datasets

  • Cora, Citeseer, Pubmed come from DGL Lib
  • BlogCatalog, Flickr come from CoLA github
  • ACM come from SDCN github
  • All above datasets are converted to undirected graphs.
Dataset Nodes Edges Attributes Classes
Cora 2,708 10,556 1,433 7
Citeseer 3,327 9,228 3,703 6
Pubmed 19,717 88,651 500 3
BlogCatalog 5,196 343,486 8,189 6
Flickr 7,575 479,476 12,047 9
ACM 3,025 26,256 1,870 3
CoraFull 19,793 126,842 8,710 70

Implemented baseline methods

Disjoint

Unsupervised Graph Neural Networks + Kmeans

method Conf/Journal Original Code Supproted
VGAE 16nips TensorFlow
GraphSAGE
DGI 19iclr Pytorch
GMI 20www Pytorch
SENet 21nn

End-to-End Graph Clustering

method Conf/Journal Original Code Supproted
SDCN 20www Pytorch
DANMF 18cikm code
M-NMF 17aaai Matlab
DFCN 21aaai Pytorch
VGAECD 18icdm --
ComE 17cikm code
DAEGC 19ijcai code

Overlapping

method Conf/Journal Original Code Supproted
CommunityGAN 19www TensorFlow

Requirements

See dependencies, requirements-dev.txt and requirements.txt.

Contributing

See CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

egc-0.3.0.tar.gz (3.6 MB view hashes)

Uploaded Source

Built Distribution

egc-0.3.0-py3-none-any.whl (208.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page