Skip to main content

An AI copilot for graph data and models (Under active development).

Project description


pygfm is a unified Python toolkit for Graph Foundation Model (GFM) research. It integrates 17 state-of-the-art baseline methods under a single, pip-installable package with shared utilities, standardized interfaces, and fully reproducible experiment pipelines.

Developed by Beihang University · School of Computer Science and Engineering · ACT Lab · MAGIC GROUP.

Framework Overview

PyGFM Framework Overview

PyGFM is organized into four stacked layers — Graph Data Abstraction → Alignment & Fusion Bridge → Representation Backbones → Task Heads & Orchestration — with a unified CLI, model recipes, and an auto-experiment tracker sitting on top.

Highlights

  • One package, 17 baselines — prompt-based GFMs, structure-aware models, LLM-integrated approaches, and retrieval-augmented methods all available via a single pip install.
  • Reproducible pipelines — every baseline ships with YAML-driven experiment configs, training scripts, and evaluation helpers.
  • Shared backbone library — common GNN encoders, loss functions, and data utilities are factored out and reused across all baselines, reducing code duplication.
  • CLI-first design — launch pre-training, fine-tuning, and evaluation jobs directly from the command line without writing any boilerplate.
  • LLM-ready — first-class support for LLM-integrated GFMs (GraphGPT, GraphText, LLaGA, OneForAll) with HuggingFace-compatible YAML configs.

Installation

CUDA (recommended)

Default (fresh env): torch + light together — PyTorch wheel index + PyPI + PyG find-links:

pip install "python-gfm[torch,light]" --index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://pypi.org/simple -f https://data.pyg.org/whl/torch-2.8.0+cu128.html

If CUDA PyTorch / PyG is already in the env — install [light] from PyPI only:

pip install "python-gfm[light]"

LLM-integrated GFMs — after [torch] and [light] are in place:

pip install "python-gfm[llm]"

CPU: --index-url https://download.pytorch.org/whl/cpu and -f https://data.pyg.org/whl/torch-2.8.0+cpu.html.

Extras overview

Extra Contents (short)
torch PyTorch Geometric stack, graph libs, sklearn helpers
light NumPy/Pandas stack, Transformers, Hydra, APIs, Gradio, W&B, SwanLab
llm PEFT, bitsandbytes, datasets, fschat, Ray, Vertex, DeepSpeed

Optional dev extra

pip install "python-gfm[dev]" adds pytest and ruff for testing and linting.

Package layout (installed wheel)

pygfm/
├── baseline_models/   # GFM baseline implementations
├── public/            # Shared utilities, losses, backbone encoders
├── private/           # Core encoders and internal helpers
└── cli/               # Console entry points

Supported Baselines

Category Methods
Prompt-based GFM MDGPT, SAMGPT, MDGFM, GraphPrompt, HGPrompt, MultiGPrompt, GCoT
Structure-aware GFM SA2GFM, Bridge, GraphKeeper, GraphMore, Graver
LLM-integrated GFM GraphGPT, GraphText, LLaGA, OneForAll
Retrieval-augmented GFM RAG-GFM

Reproducing baselines (config download)

Published YAML configs and toolbox assets live in a Hugging Face dataset. With python-gfm installed (stdlib only; no extra deps for this step), run:

python -m pygfm.cli.download --repo aboutime233/gtb --path gfmtoolbox_docs

Outputs go under --outdir (default: downloads/). Command-line options for the downloader (repo, revision, path, output directory, etc.) are described in the official documentation on the project homepage.

Experiment workflow

Typical end-to-end flow (YAML names and paths are examples — point -c at the configs you downloaded or arranged for your baseline):

# Download config files, or manually fetch them from the Hugging Face dataset:
# https://huggingface.co/datasets/aboutime233/gtb
python -m pygfm.cli.download

# Configure datasets and other settings following each baseline’s official documentation on the project site.

# Step 1: Generate few-shot downstream splits
python -m pygfm.cli.run_yaml -c configs/mdgpt/01_split_cora_1shot.yaml
# -> downstream_data/mdgpt/splits.pt

# Step 2: Leave-one-domain pre-training
python -m pygfm.cli.run_yaml -c configs/mdgpt/02_pretrain_cora.yaml
# -> ckpts/mdgpt/preprompt.pth

# Step 3: Downstream fine-tuning & evaluation
python -m pygfm.cli.run_yaml -c configs/mdgpt/03_finetune_cora_1shot.yaml
# -> Cora 1-shot node classification accuracy (and other logged outputs)

The same YAML driver is available as pygfm / gfm (see Console Commands): pygfm -c configs/mdgpt/02_pretrain_cora.yaml.

Console Commands

Command Description
python -m pygfm.cli.download Fetch baseline / toolbox YAML and assets from Hugging Face (above)
python -m pygfm.cli.run_yaml Same as pygfm / gfm: run a stage from YAML (-c /path/to/config.yaml) — see Experiment workflow

Configuration

After downloading configs, drive stages with pygfm / gfm or python -m pygfm.cli.run_yaml and -c (see Experiment workflow). For each baseline, read the official documentation on the project homepage (hyperparameters, data roots, optional API keys, etc.); do not commit secrets.

Baseline Documentation

Each baseline’s setup, data layout, and evaluation notes are published in the official documentation on the project homepage. Index of per-method guides:

Baseline Docs
MDGPT MDGPT README
SA2GFM SA2GFM README
SAMGPT SAMGPT README
MDGFM MDGFM README
GraphPrompt GraphPrompt README
HGPrompt HGPrompt README
MultiGPrompt MultiGPrompt README
GCoT GCoT README
Graver Graver README
GraphMore GraphMore README
Bridge Bridge README
GraphKeeper GraphKeeper README
GraphGPT GraphGPT README
GraphText GraphText README
LLaGA LLaGA README
OneForAll OneForAll README
RAG-GFM RAG-GFM README

Requirements

Dependency Version
Python ≥ 3.12
PyTorch 2.8.0 (CUDA 12.8 recommended)
PyTorch Geometric ≥ 2.3.0
Transformers ≥ 4.36.0
Accelerate ≥ 0.26.0

See pyproject.toml on GitHub for the full dependency specification.

License

This project is licensed under the Apache License 2.0.

Team

MAGIC GROUP — Beihang University, School of Computer Science and Engineering, ACT Lab.


If you find this toolkit useful in your research, please consider starring the repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_gfm-0.1.17.tar.gz (18.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_gfm-0.1.17-py3-none-any.whl (19.1 MB view details)

Uploaded Python 3

File details

Details for the file python_gfm-0.1.17.tar.gz.

File metadata

  • Download URL: python_gfm-0.1.17.tar.gz
  • Upload date:
  • Size: 18.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for python_gfm-0.1.17.tar.gz
Algorithm Hash digest
SHA256 cce58f107f837456bc46d5e9f5cea0850961cf970557af958fd4bdd3df98b103
MD5 ae45e5479825b86e6af3f55462c97d9f
BLAKE2b-256 f391aa76a15a1939a7a4eacf54e2a92533d1887db17ea3bfd2ec5ed6b81714c6

See more details on using hashes here.

File details

Details for the file python_gfm-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: python_gfm-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for python_gfm-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 057352b2025080bce2156ec0209520d7a59cc454e363b873ace63a4013538cac
MD5 1cf1fbae282b09e6a4965907fa06728e
BLAKE2b-256 1c44cf4b9d489505e60e3ed2dd55e1f70945abb367d0a572fb8ad0e46510c787

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page