Skip to main content

A graph benchmark library for heterophilic and heterogeneous graphs

Project description


Overview

The Heterophilic and Heterogeneous Graph Benchmark (H²GB) is a collection of graph benchmark datasets, data loaders, modular graph transformer framework (UnifiedGT) and evaluators for graph learning. The H²GB encompasses 9 diverse real-world datasets across 5 domains. Its data loaders are fully compatible with popular graph deep learning framework PyTorch Geometric. They provide automatic dataset downloading, standardized dataset splits, and unified performance evaluation.

Environment Setup

You can create a conda environment to easily run the code. For example, we can create a virtual environment named H2GB:

conda create -n H2GB python=3.9 -y
conda activate H2GB

Install the required packages using the following commands:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg -c pyg
pip install -r requirements.txt

Run the UnifiedGT

To summarize and systematically compare the performance of existing GNNs on H2GB, we designed UnifiedGT. UnifiedGT is a modular graph transformer framework that designed to encompass many existing GTs and GNNs by leveraging unified components: (1) graph sampling, (2) graph encoding, (3) graph attention, (4) attention masking, and (5) feedforward networks (FFN). It is implemented as a Python library and is user-friendly. It includes a unified data loader and evaluator, making it easy for researchers to access datasets, evaluate methods, and compare performance.

We implement 9 existing GT baselines and 19 GNN models based on UnifiedGT and provide comprehensive experiment configurations available in ./configs. To run UnifiedGT, you will need to firstly specify the dataset and log location by editing the config file provided under ./configs/{dataset_name}/. An example configuration is

......
out_dir: ./results/{Model Name} # Put your log output path here
dataset:
  dir: ./data # Put your input data path here
......

Dataset download will be automatically initiated if dataset is not found under the specified location.

For convenience, a script file is created to run the experiment with specified configuration. For instance, you can edit and run the interactive_run.sh to start the experiment.

# Assuming you are located in the H2GB repo
chmox +x ./run/interactive_run.sh
./run/interactive_run.sh

You can also directly enter this command into your terminal:

python -m H2GB.main --cfg {Path to Your Configs} name_tag {Custom Name Tag}

For example, the following command is to run MLP model experiment for oag-cs dataset.

python -m H2GB.main --cfg configs/oag-cs/oag-cs-MLP.yaml name_tag MLP

Caclulate the Metapath-Induced Adjusted Heterophily Measurement

We provide a extended heterophily measurement from homogeneous grpah into the heterogeneous setting, which is called metapath-induced heterophily measrement. The calcualtion function is available in ./H2GB/calcHomophily.py. You can simply import it by using from H2GB.calcHomophily import calcHomophily and measure the heterophily of your data. For convenience, we also provide a script to reproduce the heterophily measurement on our developed datasets. Note that $$\text{Heterophily} = 1 - \text{Homophily}$$ So just do a simple transformation to obtain the heterophily.

chmox +x ./run/calcHomo.sh
./run/calcHomo.sh

Side Notes

Encoders

The Hetero_Raw encoder are supposed to be used for heterogeneous GNN or graph dataset that has different node encoding dimensions for different node type. Therefore, each node type can be transformed separately. To reproduce results of homogeneous GNN, consider using the Raw encoder, which apply the same transformation for each node type. Otherwise, using Hetero_Raw for homogeneous GNN will misleadingly increase the task performance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

H2GB-0.1.0.tar.gz (224.8 kB view hashes)

Uploaded Source

Built Distribution

H2GB-0.1.0-py3-none-any.whl (306.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page