A python package for accessing and Hybrid-graph Datasets and train-eval framework for GNNs
Project description
This is a benchmark dataset for evaluating hybrid-graph (hypergraph and hierarchical graph) learning algorithms. It contains:
- 23 real-world higer-order graphs from the domains of biology, social media, and wikipedia
- Built-in functionalities for preprocessing hybrid-graphs
- A framework to easily train and evaluate Graph Neural Networks
Installation
Requirements
First, install the required PyTorch packages. You will need to know the version of CUDA you have installed, as well as the version of PyTorch you want to use. Replace ${TORCH}
and ${CUDA}
with these versions in the following commands:
# TORCH=2.0.1 if use newest stable torch version
# CUDA=cpu if cuda is not available
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-geometric==2.2.0
Once these dependencies are installed, you can install this package with one of the following:
Pip install
pip install hybrid-graph
# or pip install git+https://github.com/Zehui127/hybrid-graph-benchmark.git
From source
git clone https://github.com/Zehui127/hybrid-graph-benchmark.git
cd hybrid-graph-benchmark
pip install -e .
Usage
Hybrid-graph provide both datasets and flash training/evaluation capabilities.
(1) Access the Dataset
we use the torch_geometric.data.Data
to wrap the graphs with additional adjacency matrix for hyperedge representation.
from hg.datasets import Facebook, HypergraphSAINTNodeSampler
# download data to the path 'data/facebook'
data = Facebook('data/facebook')
print(data[0]) # e.g. Data(x=[22470, 128], edge_index=[2, 342004], y=[22470], hyperedge_index=[2, 2344151], num_hyperedges=236663)
# create a sampler which sample 1000 nodes from the graph for 5 times
sampler = HypergraphSAINTNodeSampler(data[0],batch_size=1000,num_steps=5)
batch = next(iter(sampler))
print(batch) # e.g. Data(num_nodes=918, edge_index=[2, 7964], hyperedge_index=[2, 957528], num_hyperedges=210718, x=[918, 128], y=[918])
Data Loaders can also be obtained using hg.hybrid_graph.io.get_dataset
from hg.hybrid_graph.io import get_dataset
name = 'musae_Facebook'
train_loader, valid_loader, test_loader,data_info = get_dataset(name)
(2) Train/Evaluate with hybrid-graph
Assuming that you have Pip install.
Training can be triggered with the following, it takes only a few minutes to train GCN even on CPU device.
#-a=gpu,cpu,tpu
hybrid-graph train grand_Lung gcn -a=cpu
Evaluation can be triggered with
# load the saved checkpoint from the path 'lightning_logs/version_0/checkpoints/best.ckpt'
hybrid-graph eval grand_lung gcn -load='lightning_logs/version_0/checkpoints/best.ckpt' -a=cpu
Add New Models
In order to add new models, you should Install from source.
cd hybrid-graph-benchmark/hg/hybrid_graph/models/gnn
touch customize_model.py
Within customize_model.py
, it should correctly handle the input feature size, prediction size and task type.
Below is the definition of vanila Graph Convolutional Networks (GCN)
from torch_geometric.nn import GCNConv
class CustomizeGNN(torch.nn.Module):
def __init__(
self, info, *args, **kwargs):
super().__init__()
dim = 32
self.conv1 = GCNConv(info["num_node_features"], dim)
self.is_regression = info["is_regression"]
if info["is_regression"]:
self.conv2 = GCNConv(dim, dim)
self.head = nn.Linear(dim, 1)
else:
self.conv2 = GCNConv(dim, info["num_classes"])
def forward(self, data, *args, **kargs):
x, edge_index = data.x, data.edge_index
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
if self.is_regression:
x = self.head(x).squeeze()
else:
x = F.log_softmax(x, dim=1)
return x
Finally, you should register you model in hybrid-graph-benchmark/hg/hybrid_graph/models/__init__.py
from .gnn.customize_model import CustomizeGNN
factory = {
'sage': SAGENet,
'gcn':CustomizeGNN, # abbreviation: ClassName,
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hybrid-graph-0.5.tar.gz
.
File metadata
- Download URL: hybrid-graph-0.5.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0062227372af010e9a53b7beb87ef2bd2199bf58007e31c1d87d8e8eaec50ad4 |
|
MD5 | fbb1ae13c1c2a86cf19f3e88a7f0125e |
|
BLAKE2b-256 | 4e8fef25e93dd50843f1c975cfe4920f4c449f90e8145229d32dcc683face709 |
File details
Details for the file hybrid_graph-0.5-py3-none-any.whl
.
File metadata
- Download URL: hybrid_graph-0.5-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac66015b750d7a5df8dc1104fa2205269013496c849b163de0dede0c92b8249c |
|
MD5 | 3c15d371c77cc0db84a7857e48e64f98 |
|
BLAKE2b-256 | ac10fffc2ccb9ed24c8423766d3d0e33c2e2d52de183cd5b6c60f9849479fc01 |