Skip to main content

A python package for accessing and Hybrid-graph Datasets and train-eval framework for GNNs

Project description


paper PyPI version paper license

This is a benchmark dataset for evaluating hybrid-graph (hypergraph and hierarchical graph) learning algorithms. It contains:

  • 23 real-world higer-order graphs from the domains of biology, social media, and wikipedia
  • Built-in functionalities for preprocessing hybrid-graphs
  • A framework to easily train and evaluate Graph Neural Networks

Installation

Requirements

First, install the required PyTorch packages. You will need to know the version of CUDA you have installed, as well as the version of PyTorch you want to use. Replace ${TORCH} and ${CUDA} with these versions in the following commands:

# TORCH=2.0.1 if use newest stable torch version
# CUDA=cpu if cuda is not available
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-geometric==2.2.0

Once these dependencies are installed, you can install this package with one of the following:

Pip install

pip install hybrid-graph
# or pip install git+https://github.com/Zehui127/hybrid-graph-benchmark.git

From source

git clone https://github.com/Zehui127/hybrid-graph-benchmark.git
cd hybrid-graph-benchmark
pip install -e .

Usage

Hybrid-graph provide both datasets and flash training/evaluation capabilities.

(1) Access the Dataset

we use the torch_geometric.data.Data to wrap the graphs with additional adjacency matrix for hyperedge representation.

from hg.datasets import Facebook, HypergraphSAINTNodeSampler
# download data to the path 'data/facebook'
data = Facebook('data/facebook')
print(data[0]) # e.g. Data(x=[22470, 128], edge_index=[2, 342004], y=[22470], hyperedge_index=[2, 2344151], num_hyperedges=236663)

# create a sampler which sample 1000 nodes from the graph for 5 times
sampler = HypergraphSAINTNodeSampler(data[0],batch_size=1000,num_steps=5)
batch = next(iter(sampler))
print(batch)  # e.g. Data(num_nodes=918, edge_index=[2, 7964], hyperedge_index=[2, 957528], num_hyperedges=210718, x=[918, 128], y=[918])

Data Loaders can also be obtained using hg.hybrid_graph.io.get_dataset

from hg.hybrid_graph.io import get_dataset
name = 'musae_Facebook'
train_loader, valid_loader, test_loader,data_info = get_dataset(name)

(2) Train/Evaluate with hybrid-graph

Assuming that you have Pip install.

Training can be triggered with the following, it takes only a few minutes to train GCN even on CPU device.

#-a=gpu,cpu,tpu
hybrid-graph train grand_Lung gcn -a=cpu

Evaluation can be triggered with

# load the saved checkpoint from the path 'lightning_logs/version_0/checkpoints/best.ckpt'
hybrid-graph eval grand_lung gcn -load='lightning_logs/version_0/checkpoints/best.ckpt' -a=cpu

Add New Models

In order to add new models, you should Install from source.

cd hybrid-graph-benchmark/hg/hybrid_graph/models/gnn
touch customize_model.py

Within customize_model.py, it should correctly handle the input feature size, prediction size and task type. Below is the definition of vanila Graph Convolutional Networks (GCN)

from torch_geometric.nn import GCNConv
class CustomizeGNN(torch.nn.Module):
    def __init__(
            self, info, *args, **kwargs):
        super().__init__()
        dim = 32
        self.conv1 = GCNConv(info["num_node_features"], dim)
        self.is_regression = info["is_regression"]
        if info["is_regression"]:
            self.conv2 = GCNConv(dim, dim)
            self.head = nn.Linear(dim, 1)
        else:
            self.conv2 = GCNConv(dim, info["num_classes"])

    def forward(self, data, *args, **kargs):
        x, edge_index = data.x, data.edge_index
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        if self.is_regression:
            x = self.head(x).squeeze()
        else:
            x = F.log_softmax(x, dim=1)
        return x

Finally, you should register you model in hybrid-graph-benchmark/hg/hybrid_graph/models/__init__.py

from .gnn.customize_model import CustomizeGNN
factory = {
            'sage': SAGENet,
            'gcn':CustomizeGNN, # abbreviation: ClassName,
          }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybrid-graph-0.61.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

hybrid_graph-0.61-py3-none-any.whl (38.3 kB view details)

Uploaded Python 3

File details

Details for the file hybrid-graph-0.61.tar.gz.

File metadata

  • Download URL: hybrid-graph-0.61.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for hybrid-graph-0.61.tar.gz
Algorithm Hash digest
SHA256 afee918f08a58dc19a93113d560e85083e1e5b7312cee452c21fc86c91e8b58c
MD5 1a19a370347507de3192fb816d1ace4a
BLAKE2b-256 2ad070da817a4616a63d94c038b5fa05fef99ad46d3f36c1a1a40d6b684dc3de

See more details on using hashes here.

File details

Details for the file hybrid_graph-0.61-py3-none-any.whl.

File metadata

  • Download URL: hybrid_graph-0.61-py3-none-any.whl
  • Upload date:
  • Size: 38.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for hybrid_graph-0.61-py3-none-any.whl
Algorithm Hash digest
SHA256 09cb9f4b9e4115acd19382ce84440159583e190396d4f92154c7c7447fd15445
MD5 7f7924a9e963c9ea401e7b08795f0f61
BLAKE2b-256 9ccb666a582f3448a9d6e1913956f4a2c2ee658e346b002cae418b64a4732fa7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page