Skip to main content

An AI copilot for graph data and models (Under active development).

Project description


pygfm is a unified Python toolkit for Graph Foundation Model (GFM) research. It integrates 17 state-of-the-art baseline methods under a single, pip-installable package with shared utilities, standardized interfaces, and fully reproducible experiment pipelines.

Developed by Beihang University · School of Computer Science and Engineering · ACT Lab · MAGIC GROUP.

Framework Overview

PyGFM Framework Overview

PyGFM is organized into four stacked layers — Graph Data Abstraction → Alignment & Fusion Bridge → Representation Backbones → Task Heads & Orchestration — with a unified CLI, model recipes, and an auto-experiment tracker sitting on top.

Highlights

  • One package, 17 baselines — prompt-based GFMs, structure-aware models, LLM-integrated approaches, and retrieval-augmented methods all available via a single pip install.
  • Reproducible pipelines — every baseline ships with YAML-driven experiment configs, training scripts, and evaluation helpers.
  • Shared backbone library — common GNN encoders, loss functions, and data utilities are factored out and reused across all baselines, reducing code duplication.
  • CLI-first design — launch pre-training, fine-tuning, and evaluation jobs directly from the command line without writing any boilerplate.
  • LLM-ready — first-class support for LLM-integrated GFMs (GraphGPT, GraphText, LLaGA, OneForAll) with HuggingFace-compatible YAML configs.

Installation

CUDA (recommended)

Default (fresh env): torch + light together — PyTorch wheel index + PyPI + PyG find-links:

pip install "python-gfm[torch,light]" --index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://pypi.org/simple -f https://data.pyg.org/whl/torch-2.8.0+cu128.html

If CUDA PyTorch / PyG is already in the env — install [light] from PyPI only:

pip install "python-gfm[light]"

LLM-integrated GFMs — after [torch] and [light] are in place:

pip install "python-gfm[llm]"

CPU: --index-url https://download.pytorch.org/whl/cpu and -f https://data.pyg.org/whl/torch-2.8.0+cpu.html.

Extras overview

Extra Contents (short)
torch PyTorch Geometric stack, graph libs, sklearn helpers
light NumPy/Pandas stack, Transformers, Hydra, APIs, Gradio, W&B, SwanLab
llm PEFT, bitsandbytes, datasets, fschat, Ray, Vertex, DeepSpeed

Optional dev extra

pip install "python-gfm[dev]" adds pytest and ruff for testing and linting.

Package layout (installed wheel)

pygfm/
├── baseline_models/   # GFM baseline implementations
├── public/            # Shared utilities, losses, backbone encoders
├── private/           # Core encoders and internal helpers
└── cli/               # Console entry points

Supported Baselines

Category Methods
Prompt-based GFM MDGPT, SAMGPT, MDGFM, GraphPrompt, HGPrompt, MultiGPrompt, GCoT
Structure-aware GFM SA2GFM, Bridge, GraphKeeper, GraphMore, Graver
LLM-integrated GFM GraphGPT, GraphText, LLaGA, OneForAll
Retrieval-augmented GFM RAG-GFM

Reproducing baselines (config download)

Published YAML configs and toolbox assets live in a Hugging Face dataset. With python-gfm installed (stdlib only; no extra deps for this step), run:

python -m pygfm.cli.download --repo aboutime233/gtb --path gfmtoolbox_docs

Outputs go under --outdir (default: downloads/). Command-line options for the downloader (repo, revision, path, output directory, etc.) are described in the official documentation on the project homepage.

Experiment workflow

Typical end-to-end flow (YAML names and paths are examples — point -c at the configs you downloaded or arranged for your baseline):

# Download config files, or manually fetch them from the Hugging Face dataset:
# https://huggingface.co/datasets/aboutime233/gtb
python -m pygfm.cli.download

# Configure datasets and other settings following each baseline’s official documentation on the project site.

# Step 1: Generate few-shot downstream splits
python -m pygfm.cli.run_yaml -c configs/mdgpt/01_split_cora_1shot.yaml
# -> downstream_data/mdgpt/splits.pt

# Step 2: Leave-one-domain pre-training
python -m pygfm.cli.run_yaml -c configs/mdgpt/02_pretrain_cora.yaml
# -> ckpts/mdgpt/preprompt.pth

# Step 3: Downstream fine-tuning & evaluation
python -m pygfm.cli.run_yaml -c configs/mdgpt/03_finetune_cora_1shot.yaml
# -> Cora 1-shot node classification accuracy (and other logged outputs)

The same YAML driver is available as pygfm / gfm (see Console Commands): pygfm -c configs/mdgpt/02_pretrain_cora.yaml.

Console Commands

Command Description
python -m pygfm.cli.download Fetch baseline / toolbox YAML and assets from Hugging Face (above)
python -m pygfm.cli.run_yaml Same as pygfm / gfm: run a stage from YAML (-c /path/to/config.yaml) — see Experiment workflow

Configuration

After downloading configs, drive stages with pygfm / gfm or python -m pygfm.cli.run_yaml and -c (see Experiment workflow). For each baseline, read the official documentation on the project homepage (hyperparameters, data roots, optional API keys, etc.); do not commit secrets.

Baseline Documentation

Each baseline’s setup, data layout, and evaluation notes are published in the official documentation on the project homepage. Index of per-method guides:

Baseline Docs
MDGPT MDGPT README
SA2GFM SA2GFM README
SAMGPT SAMGPT README
MDGFM MDGFM README
GraphPrompt GraphPrompt README
HGPrompt HGPrompt README
MultiGPrompt MultiGPrompt README
GCoT GCoT README
Graver Graver README
GraphMore GraphMore README
Bridge Bridge README
GraphKeeper GraphKeeper README
GraphGPT GraphGPT README
GraphText GraphText README
LLaGA LLaGA README
OneForAll OneForAll README
RAG-GFM RAG-GFM README

Requirements

Dependency Version
Python ≥ 3.12
PyTorch 2.8.0 (CUDA 12.8 recommended)
PyTorch Geometric ≥ 2.3.0
Transformers ≥ 4.36.0
Accelerate ≥ 0.26.0

See pyproject.toml on GitHub for the full dependency specification.

License

This project is licensed under the Apache License 2.0.

Team

MAGIC GROUP — Beihang University, School of Computer Science and Engineering, ACT Lab.


If you find this toolkit useful in your research, please consider starring the repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_gfm-0.1.13.tar.gz (18.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_gfm-0.1.13-py3-none-any.whl (19.1 MB view details)

Uploaded Python 3

File details

Details for the file python_gfm-0.1.13.tar.gz.

File metadata

  • Download URL: python_gfm-0.1.13.tar.gz
  • Upload date:
  • Size: 18.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for python_gfm-0.1.13.tar.gz
Algorithm Hash digest
SHA256 531e83ef55350d934632f892bad3092c738c11e9cda35727562435360f291ec4
MD5 297882403b4f6b54f6427e24745c387c
BLAKE2b-256 3918fc3d3604d1d207b5db436fe656c39c5437327db4c23000f2320d59bc0e63

See more details on using hashes here.

File details

Details for the file python_gfm-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: python_gfm-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for python_gfm-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 dc3c4b922d19c558e241f648f8936a2e79ffd8018f8cc80d01a9245bd210623c
MD5 7b18fa55a514f8d9815bc9beadd4fe12
BLAKE2b-256 133cf4ba20b400f532f459c5e6545c1f61c363e49a0bae3493008cd1cd261cee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page