Skip to main content

Benchmark of Graph Clustering.

Project description

EAGLEGraphClustering

PYPI

A benchmark of graph clustering from EAGLE-Lab, Zhejiang University.

Installation

  • python>=3.8
  • torch>=1.12
  • dgl>=1.1
$ python -m pip install egc

Usage

Pip Package

See egc for docs.

  • Import the package and use any graph clustering model supported as:
from torch import nn
from egc.model import DGL_GAEKmeans
from egc.utils import load_data
from egc.utils import get_default_args
from egc.utils import set_seed
from egc.utils import set_device

# set the random seed
set_seed(4096)

# set the gpu id
set_device('0')

# load graph
graph, label, n_clusters = load_data(
    dataset_name='Cora',
    directory='./data',
)
features = graph.ndata["feat"]
adj_csr = graph.adj_external(scipy_fmt="csr")

# get default args
args = get_default_args('gae_kmeans')

# init the model
model = DGL_GAEKmeans(
    epochs=args["epochs"],
    n_clusters=10,
    fead_dim=features.shape[1],
    n_nodes=features.shape[0],
    hidden_dim1=args["hidden1"],
    dropout=args["dropout"],
    lr=args["lr"],
    early_stop=args["early_stopping_epoch"],
    activation=args["activation"],
)

# fit the model
model.fit(adj_csr, features)

# get clustering results
res = model.get_memberships()

Command line

  • Clone the Repo
  • Install the env
# NOTE: python>=3.8 is needed
# Install cuda if necessary. Check Cuda version first.
$ cd EGC
# Leave out `bash .ci/install-dev.sh &&` if no dev env is needed.
$ bash .ci/install-dev.sh && bash .ci/install.sh
# run `source .env/bin/activate` to activate the virtual env
  • Run any supported model as:
$ python train.py ${OPTIONAL global args} ${POSITIONAL args (model)} ${optional model args}
  • OPTIONAL global args which should be used before ${model}
  • POSITIONAL args, i.e., all models supported

E.g.,

# check OPTIONAL global args, e.g., all models supported
$ python train.py -h

# check optional args of certain model
$ python train.py gae_kmeans -h

# run any model
$ python train.py --dataset=Cora gae_kmeans --lr 0.001

Datasets

  • Cora, Citeseer, Pubmed come from DGL Lib
  • BlogCatalog, Flickr come from CoLA github
  • ACM come from SDCN github
  • All above datasets are converted to undirected graphs.
Dataset Nodes Edges Attributes Classes
Cora 2,708 10,556 1,433 7
Citeseer 3,327 9,228 3,703 6
Pubmed 19,717 88,651 500 3
BlogCatalog 5,196 343,486 8,189 6
Flickr 7,575 479,476 12,047 9
ACM 3,025 26,256 1,870 3
CoraFull 19,793 126,842 8,710 70

Implemented baseline methods

Disjoint

Unsupervised Graph Neural Networks + Kmeans

method Conf/Journal Original Code Supproted
VGAE 16nips TensorFlow
GraphSAGE
DGI 19iclr Pytorch
GMI 20www Pytorch
SENet 21nn

End-to-End Graph Clustering

method Conf/Journal Original Code Supproted
SDCN 20www Pytorch
DANMF 18cikm code
M-NMF 17aaai Matlab
DFCN 21aaai Pytorch
VGAECD 18icdm --
ComE 17cikm code
DAEGC 19ijcai code

Overlapping

method Conf/Journal Original Code Supproted
CommunityGAN 19www TensorFlow

Requirements

See dependencies, requirements-dev.txt and requirements.txt.

Contributing

See CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

egc-0.3.0.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

egc-0.3.0-py3-none-any.whl (208.4 kB view details)

Uploaded Python 3

File details

Details for the file egc-0.3.0.tar.gz.

File metadata

  • Download URL: egc-0.3.0.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for egc-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e5c1993bcc65295de978ad893ab48360b38b64dcdb43cf0445e0b3e397d1e632
MD5 39bd01303a562f7890fbc052ba0a63e6
BLAKE2b-256 14cb418c94d814bc4cea8ddf4f495a815f3f7f6fa5f65c604bf3d4d6202fe28a

See more details on using hashes here.

File details

Details for the file egc-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: egc-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 208.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for egc-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60ceb780abb6d937f07f22dd76d626aefd81da8b1e4fd3af3af8bd711d0552ce
MD5 a3a43f936bd473569b4c869692ce2718
BLAKE2b-256 3e7a5ab396211d9c80b783d45791234d898e9b1c13a6edb4866071814a299cd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page