Benchmark of Graph Clustering.
Project description
EAGLEGraphClustering
A benchmark of graph clustering from EAGLE-Lab, Zhejiang University.
Installation
- python>=3.8
- torch>=1.12
- dgl>=1.1
$ python -m pip install egc
Usage
Pip Package
See egc for docs.
- Import the package and use any graph clustering model supported as:
from torch import nn
from egc.model import DGL_GAEKmeans
from egc.utils import load_data
from egc.utils import get_default_args
from egc.utils import set_seed
from egc.utils import set_device
# set the random seed
set_seed(4096)
# set the gpu id
set_device('0')
# load graph
graph, label, n_clusters = load_data(
dataset_name='Cora',
directory='./data',
)
features = graph.ndata["feat"]
adj_csr = graph.adj_external(scipy_fmt="csr")
# get default args
args = get_default_args('gae_kmeans')
# init the model
model = DGL_GAEKmeans(
epochs=args["epochs"],
n_clusters=10,
fead_dim=features.shape[1],
n_nodes=features.shape[0],
hidden_dim1=args["hidden1"],
dropout=args["dropout"],
lr=args["lr"],
early_stop=args["early_stopping_epoch"],
activation=args["activation"],
)
# fit the model
model.fit(adj_csr, features)
# get clustering results
res = model.get_memberships()
Command line
- Clone the Repo
- Install the env
# NOTE: python>=3.8 is needed
# Install cuda if necessary. Check Cuda version first.
$ cd EGC
# Leave out `bash .ci/install-dev.sh &&` if no dev env is needed.
$ bash .ci/install-dev.sh && bash .ci/install.sh
# run `source .env/bin/activate` to activate the virtual env
- Run any supported model as:
$ python train.py ${OPTIONAL global args} ${POSITIONAL args (model)} ${optional model args}
- OPTIONAL global args which should be used before
${model}
- POSITIONAL args, i.e., all models supported
E.g.,
# check OPTIONAL global args, e.g., all models supported
$ python train.py -h
# check optional args of certain model
$ python train.py gae_kmeans -h
# run any model
$ python train.py --dataset=Cora gae_kmeans --lr 0.001
Datasets
Cora
,Citeseer
,Pubmed
come from DGL LibBlogCatalog
,Flickr
come from CoLA githubACM
come from SDCN github- All above datasets are converted to undirected graphs.
Dataset | Nodes | Edges | Attributes | Classes |
---|---|---|---|---|
Cora | 2,708 | 10,556 | 1,433 | 7 |
Citeseer | 3,327 | 9,228 | 3,703 | 6 |
Pubmed | 19,717 | 88,651 | 500 | 3 |
BlogCatalog | 5,196 | 343,486 | 8,189 | 6 |
Flickr | 7,575 | 479,476 | 12,047 | 9 |
ACM | 3,025 | 26,256 | 1,870 | 3 |
CoraFull | 19,793 | 126,842 | 8,710 | 70 |
Implemented baseline methods
Disjoint
Unsupervised Graph Neural Networks + Kmeans
method | Conf/Journal | Original Code | Supproted |
---|---|---|---|
VGAE | 16nips | TensorFlow | ✅ |
GraphSAGE | |||
DGI | 19iclr | Pytorch | ✅ |
GMI | 20www | Pytorch | ✅ |
SENet | 21nn | ✅ |
End-to-End Graph Clustering
method | Conf/Journal | Original Code | Supproted |
---|---|---|---|
SDCN | 20www | Pytorch | ✅ |
DANMF | 18cikm | code | ✅ |
M-NMF | 17aaai | Matlab | ✅ |
DFCN | 21aaai | Pytorch | ✅ |
VGAECD | 18icdm | -- | ✅ |
ComE | 17cikm | code | ✅ |
DAEGC | 19ijcai | code | ✅ |
Overlapping
method | Conf/Journal | Original Code | Supproted |
---|---|---|---|
CommunityGAN | 19www | TensorFlow | ✅ |
Requirements
See dependencies, requirements-dev.txt and requirements.txt.
Contributing
See CONTRIBUTING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
egc-0.3.0.tar.gz
(3.6 MB
view hashes)
Built Distribution
egc-0.3.0-py3-none-any.whl
(208.4 kB
view hashes)