Skip to main content

A graph benchmark library for heterophilic and heterogeneous graphs

Project description


Overview

The Heterophilic and Heterogeneous Graph Benchmark (H²GB) is a collection of graph benchmark datasets, data loaders, modular graph transformer framework (UnifiedGT) and evaluators for graph learning. The H²GB encompasses 9 diverse real-world datasets across 5 domains. Its data loaders are fully compatible with popular graph deep learning framework PyTorch Geometric. They provide automatic dataset downloading, standardized dataset splits, and unified performance evaluation.

Environment Setup

You can create a conda environment to easily run the code. For example, we can create a virtual environment named H2GB:

conda create -n H2GB python=3.9 -y
conda activate H2GB

Install the required packages using the following commands:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg -c pyg
pip install -r requirements.txt

Run the UnifiedGT

To summarize and systematically compare the performance of existing GNNs on H2GB, we designed UnifiedGT. UnifiedGT is a modular graph transformer framework that designed to encompass many existing GTs and GNNs by leveraging unified components: (1) graph sampling, (2) graph encoding, (3) graph attention, (4) attention masking, and (5) feedforward networks (FFN). It is implemented as a Python library and is user-friendly. It includes a unified data loader and evaluator, making it easy for researchers to access datasets, evaluate methods, and compare performance.

We implement 9 existing GT baselines and 19 GNN models based on UnifiedGT and provide comprehensive experiment configurations available in ./configs. To run UnifiedGT, you will need to firstly specify the dataset and log location by editing the config file provided under ./configs/{dataset_name}/. An example configuration is

......
out_dir: ./results/{Model Name} # Put your log output path here
dataset:
  dir: ./data # Put your input data path here
......

Dataset download will be automatically initiated if dataset is not found under the specified location.

For convenience, a script file is created to run the experiment with specified configuration. For instance, you can edit and run the interactive_run.sh to start the experiment.

# Assuming you are located in the H2GB repo
chmox +x ./run/interactive_run.sh
./run/interactive_run.sh

You can also directly enter this command into your terminal:

python -m H2GB.main --cfg {Path to Your Configs} name_tag {Custom Name Tag}

For example, the following command is to run MLP model experiment for oag-cs dataset.

python -m H2GB.main --cfg configs/oag-cs/oag-cs-MLP.yaml name_tag MLP

Caclulate the Metapath-Induced Adjusted Heterophily Measurement

We provide a extended heterophily measurement from homogeneous grpah into the heterogeneous setting, which is called metapath-induced heterophily measrement. The calcualtion function is available in ./H2GB/calcHomophily.py. You can simply import it by using from H2GB.calcHomophily import calcHomophily and measure the heterophily of your data. For convenience, we also provide a script to reproduce the heterophily measurement on our developed datasets. Note that $$\text{Heterophily} = 1 - \text{Homophily}$$ So just do a simple transformation to obtain the heterophily.

chmox +x ./run/calcHomo.sh
./run/calcHomo.sh

Side Notes

Encoders

The Hetero_Raw encoder are supposed to be used for heterogeneous GNN or graph dataset that has different node encoding dimensions for different node type. Therefore, each node type can be transformed separately. To reproduce results of homogeneous GNN, consider using the Raw encoder, which apply the same transformation for each node type. Otherwise, using Hetero_Raw for homogeneous GNN will misleadingly increase the task performance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

H2GB-0.1.0.tar.gz (224.8 kB view details)

Uploaded Source

Built Distribution

H2GB-0.1.0-py3-none-any.whl (306.5 kB view details)

Uploaded Python 3

File details

Details for the file H2GB-0.1.0.tar.gz.

File metadata

  • Download URL: H2GB-0.1.0.tar.gz
  • Upload date:
  • Size: 224.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.13

File hashes

Hashes for H2GB-0.1.0.tar.gz
Algorithm Hash digest
SHA256 02ecd4c318c37d631935f7cdc6e29c9af39f4f4cf8493b77c25ee1f8940690d1
MD5 c6cf7b00fe660fd706b81037e3ec718e
BLAKE2b-256 b14f547d46034616da2e23b4e3ef53c77da34a18aa3dc54a356d18b892290182

See more details on using hashes here.

File details

Details for the file H2GB-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: H2GB-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 306.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.13

File hashes

Hashes for H2GB-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25fad2e1719fe61bd3796398caaa36f9ac9b20997a74fe50e260635a24f46d2e
MD5 c64dc011846bb6ff8d6b7bd73c8195a6
BLAKE2b-256 f77ed97110b4a6cda2aa24ecabb105dc172db8d450a5eabe1fb7e0efd51053a9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page