Skip to main content

A python package for accessing and Hybrid-graph Datasets and train-eval framework for GNNs

Project description


paper PyPI version paper license

This is a benchmark dataset for evaluating hybrid-graph (hypergraph and hierarchical graph) learning algorithms. It contains:

  • 23 real-world higer-order graphs from the domains of biology, social media, and wikipedia
  • Built-in functionalities for preprocessing hybrid-graphs
  • A framework to easily train and evaluate Graph Neural Networks

Installation

Requirements

First, install the required PyTorch packages. You will need to know the version of CUDA you have installed, as well as the version of PyTorch you want to use. Replace ${TORCH} and ${CUDA} with these versions in the following commands:

# TORCH=2.0.1 if use newest stable torch version
# CUDA=cpu if cuda is not available
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-geometric==2.2.0

Once these dependencies are installed, you can install this package with one of the following:

Pip install

pip install hybrid-graph
# or pip install git+https://github.com/Zehui127/hybrid-graph-benchmark.git

From source

git clone https://github.com/Zehui127/hybrid-graph-benchmark.git
cd hybrid-graph-benchmark
pip install -e .

Usage

Hybrid-graph provide both datasets and flash training/evaluation capabilities.

(1) Access the Dataset

we use the torch_geometric.data.Data to wrap the graphs with additional adjacency matrix for hyperedge representation.

from hg.datasets import Facebook, HypergraphSAINTNodeSampler
# download data to the path 'data/facebook'
data = Facebook('data/facebook')
print(data[0]) # e.g. Data(x=[22470, 128], edge_index=[2, 342004], y=[22470], hyperedge_index=[2, 2344151], num_hyperedges=236663)

# create a sampler which sample 1000 nodes from the graph for 5 times
sampler = HypergraphSAINTNodeSampler(data[0],batch_size=1000,num_steps=5)
batch = next(iter(sampler))
print(batch)  # e.g. Data(num_nodes=918, edge_index=[2, 7964], hyperedge_index=[2, 957528], num_hyperedges=210718, x=[918, 128], y=[918])

Data Loaders can also be obtained using hg.hybrid_graph.io.get_dataset

from hg.hybrid_graph.io import get_dataset
name = 'musae_Facebook'
train_loader, valid_loader, test_loader,data_info = get_dataset(name)

(2) Train/Evaluate with hybrid-graph

Assuming that you have Pip install.

Training can be triggered with the following, it takes only a few minutes to train GCN even on CPU device.

#-a=gpu,cpu,tpu
hybrid-graph train grand_Lung gcn -a=cpu

Evaluation can be triggered with

# load the saved checkpoint from the path 'lightning_logs/version_0/checkpoints/best.ckpt'
hybrid-graph eval grand_lung gcn -load='lightning_logs/version_0/checkpoints/best.ckpt' -a=cpu

Add New Models

In order to add new models, you should Install from source.

cd hybrid-graph-benchmark/hg/hybrid_graph/models/gnn
touch customize_model.py

Within customize_model.py, it should correctly handle the input feature size, prediction size and task type. Below is the definition of vanila Graph Convolutional Networks (GCN)

from torch_geometric.nn import GCNConv
class CustomizeGNN(torch.nn.Module):
    def __init__(
            self, info, *args, **kwargs):
        super().__init__()
        dim = 32
        self.conv1 = GCNConv(info["num_node_features"], dim)
        self.is_regression = info["is_regression"]
        if info["is_regression"]:
            self.conv2 = GCNConv(dim, dim)
            self.head = nn.Linear(dim, 1)
        else:
            self.conv2 = GCNConv(dim, info["num_classes"])

    def forward(self, data, *args, **kargs):
        x, edge_index = data.x, data.edge_index
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        if self.is_regression:
            x = self.head(x).squeeze()
        else:
            x = F.log_softmax(x, dim=1)
        return x

Finally, you should register you model in hybrid-graph-benchmark/hg/hybrid_graph/models/__init__.py

from .gnn.customize_model import CustomizeGNN
factory = {
            'sage': SAGENet,
            'gcn':CustomizeGNN, # abbreviation: ClassName,
          }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybrid-graph-0.5.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

hybrid_graph-0.5-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file hybrid-graph-0.5.tar.gz.

File metadata

  • Download URL: hybrid-graph-0.5.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for hybrid-graph-0.5.tar.gz
Algorithm Hash digest
SHA256 0062227372af010e9a53b7beb87ef2bd2199bf58007e31c1d87d8e8eaec50ad4
MD5 fbb1ae13c1c2a86cf19f3e88a7f0125e
BLAKE2b-256 4e8fef25e93dd50843f1c975cfe4920f4c449f90e8145229d32dcc683face709

See more details on using hashes here.

File details

Details for the file hybrid_graph-0.5-py3-none-any.whl.

File metadata

  • Download URL: hybrid_graph-0.5-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for hybrid_graph-0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ac66015b750d7a5df8dc1104fa2205269013496c849b163de0dede0c92b8249c
MD5 3c15d371c77cc0db84a7857e48e64f98
BLAKE2b-256 ac10fffc2ccb9ed24c8423766d3d0e33c2e2d52de183cd5b6c60f9849479fc01

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page