Skip to main content

An AI copilot for graph data and models (Under active development).

Project description


pygfm is a unified Python toolkit for Graph Foundation Model (GFM) research. It integrates 19 state-of-the-art baseline methods under a single, pip-installable package with shared utilities, standardized interfaces, and fully reproducible experiment pipelines.

Developed by Beihang University · School of Computer Science and Engineering · ACT Lab · MAGIC GROUP.

Framework Overview

PyGFM Framework Overview

PyGFM is organized into four stacked layers — Graph Data Abstraction → Alignment & Fusion Bridge → Representation Backbones → Task Heads & Orchestration — with a unified CLI, model recipes, and an auto-experiment tracker sitting on top.

Highlights

  • One package, 19 baselines — prompt-based GFMs, structure-aware models, LLM-integrated approaches, and retrieval-augmented methods all available via a single pip install.
  • Reproducible pipelines — every baseline ships with YAML-driven experiment configs, training scripts, and evaluation helpers.
  • Shared backbone library — common GNN encoders, loss functions, and data utilities are factored out and reused across all baselines, reducing code duplication.
  • CLI-first design — launch pre-training, fine-tuning, and evaluation jobs directly from the command line without writing any boilerplate.
  • LLM-ready — first-class support for LLM-integrated GFMs (GraphGPT, GraphText, LLaGA, OneForAll) with HuggingFace-compatible YAML configs.

Installation

Minimal install (utilities only)

pip install python-gfm

With PyTorch + PyG (recommended for running experiments)

# 1. Install PyTorch with CUDA 12.8 support
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# 2. Install pygfm with the full ML stack (PyG extensions are resolved automatically)
pip install "python-gfm[torch]" -f https://data.pyg.org/whl/torch-2.8.0+cu128.html

CPU-only machines: replace the CUDA index URLs with https://download.pytorch.org/whl/cpu and https://data.pyg.org/whl/torch-2.8.0+cpu.html respectively.

Optional stacks: torch, light, llm

Dependencies are grouped into three extras (combine them with commas inside the brackets):

Extra Purpose
torch CUDA PyTorch + PyG (torch-scatter / torch-sparse / torch-cluster / torch-spline-conv) plus graph packages (ogb, geoopt, deeprobust) and loose pins used by GNN code paths (transformers, accelerate, etc.). Requires the PyTorch wheel index and PyG -f link below.
light Lightweight “everything except GPU LLM megastack”: pinned numpy/scipy/pandas, sklearn, Hydra/OmegaConf, HTTP clients, OpenAI/Anthropic, FastAPI/Uvicorn, RAG helpers (nano-vectordb), etc. No torch wheel — install from PyPI only.
llm Full LLM and serving stack: pinned Transformers, PEFT, Accelerate, bitsandbytes, datasets, sentence-transformers, fschat, Gradio, Ray, Weights & Biases, SwanLab, Google Cloud Vertex AI, DeepSpeed. Use together with torch (+ light) when you run GraphGPT / LLaGA / GraphText–style pipelines.

What to install

You want to… Install
Run structure / prompt GNN baselines (MDGPT, SA2GFM, Bridge, GraphMore, …) python-gfm[torch,light] + PyTorch index + PyG -f
Use Hydra configs, APIs, RAG corpus tools without the CUDA stack in this env python-gfm[light]
Run LLM-integrated baselines (GraphGPT, LLaGA, GraphText, GCoT serving, LoRA, Ray eval, loggers, Vertex) python-gfm[torch,light,llm] + same CUDA flags as torch

Index hygiene: keep PyPI as the default index; add PyTorch with --extra-index-url, not --index-url. Use -f for PyG whenever [torch] is included (same URLs as in PyTorch + PyG).

python -m pip install -U pip setuptools wheel
export PIP_DEFAULT_TIMEOUT=120   # optional: reduce ReadTimeout errors

# GNN / structure baselines
pip install "python-gfm[torch,light]" \
  --extra-index-url https://download.pytorch.org/whl/cu128 \
  -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
# LLM-integrated baselines (full stack)
pip install "python-gfm[torch,light,llm]" \
  --extra-index-url https://download.pytorch.org/whl/cu128 \
  -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
# Lightweight tools only (no CUDA torch)
pip install "python-gfm[light]"

Mirrors: you may set the default PyPI index to a mirror (e.g. -i https://pypi.tuna.tsinghua.edu.cn/simple); keep --extra-index-url and -f for PyTorch and PyG when using [torch].

On Windows PowerShell, replace the line-ending backslashes with carets (^) or put the command on one line.

Development install (full checkout with experiment scripts)

git clone <repo-url> && cd pygfm
pip install -e ".[torch,dev]"
# e.g. local full dev stack: pip install -e ".[torch,light,llm,dev]"  (same --extra-index-url / -f as above when [torch] is included)

The dev extra adds pytest and ruff for testing and linting.

Quick Start

import pygfm

print(pygfm.__version__)

Run a pre-training job from the CLI:

# SA2GFM contrastive pre-training
gfm-sa2gfm-pretrain -c scripts/sa2gfm/configs/pretrain.yaml

# SA2GFM downstream fine-tuning
gfm-sa2gfm-downstream -c scripts/sa2gfm/configs/downstream.yaml

Package Structure

pygfm/
├── src/pygfm/
│   ├── baseline_models/   # 19 GFM baseline implementations
│   ├── public/            # Shared utilities, losses, and backbone encoders
│   │   ├── backbone_models/
│   │   ├── utils/
│   │   └── cli/
│   ├── private/           # Core encoders and internal data generation
│   └── cli/               # Console entry points
└── scripts/               # Per-baseline experiment scripts and configs
    ├── <baseline>/
    │   ├── README.md
    │   ├── configs/
    │   ├── pretrain.py / downstream.py / ...
    │   └── eval_script/

Supported Baselines

Category Methods
Prompt-based GFM MDGPT, SAMGPT, MDGFM, GraphPrompt, HGPrompt, MultiGPrompt, GCoT
Structure-aware GFM SA2GFM, Bridge, GraphKeeper, GraphMore, Graver, BIM-GFM
LLM-integrated GFM GraphGPT, GraphText, LLaGA, OneForAll
Retrieval-augmented GFM RAG-GFM
Classic Baseline Classic GNN

Running Experiments

All scripts are under scripts/<baseline>/ and should be run from the repository root.

# Prompt-based: MDGPT pre-training
python scripts/mdgpt/pretrain.py

# Structure-aware: SA2GFM downstream fine-tuning
python scripts/sa2gfm/downstream.py

# LLM-integrated: GCoT full pipeline
python scripts/gcot/pretrain.py
python scripts/gcot/finetune.py
python scripts/gcot/finetune_graph.py

# LLM-integrated: GraphGPT (YAML-driven HuggingFace-style training)
python scripts/graphgpt/run_with_config.py -c scripts/graphgpt/configs/train_mem_template.yaml

Console Commands

After installation the following CLI entry points are registered:

Command Description
pygfm / gfm Generic YAML-driven runner (-c <config.yaml>)
gfm-sa2gfm-pretrain SA2GFM contrastive pre-training
gfm-sa2gfm-downstream SA2GFM MoE downstream fine-tuning

Configuration

All experiment hyperparameters are stored as YAML files under scripts/<baseline>/configs/. Pass configs via the -c flag:

python scripts/<baseline>/pretrain.py -c scripts/<baseline>/configs/default.yaml

API keys: baselines that call external LLM APIs (e.g., GraphText) read credentials from a local env file. Never commit API keys to the repository. Copy the example template and fill in your keys:

cp scripts/graphtext/config/user/env.yaml.example scripts/graphtext/config/user/env.yaml
# Then edit env.yaml and add your API key

Baseline Documentation

Each baseline ships a dedicated README with setup instructions, data preparation steps, and evaluation notes:

Baseline Docs
MDGPT scripts/mdgpt/README.md
SA2GFM scripts/sa2gfm/README.md
SAMGPT scripts/samgpt/README.md
MDGFM scripts/mdgfm/README.md
GraphPrompt scripts/graphprompt/README.md
HGPrompt scripts/hgprompt/README.md
MultiGPrompt scripts/multigprompt/README.md
GCoT scripts/gcot/README.md
Graver scripts/graver/README.md
GraphMore scripts/graphmore/README.md
Bridge scripts/bridge/README.md
GraphKeeper scripts/graphkeeper/README.md
GraphGPT scripts/graphgpt/README.md
GraphText scripts/graphtext/README.md
LLaGA scripts/llaga/README.md
OneForAll scripts/oneforall/README.md
RAG-GFM scripts/rag_gfm/README.md

Requirements

Dependency Version
Python ≥ 3.12
PyTorch 2.8.0 (CUDA 12.8 recommended)
PyTorch Geometric ≥ 2.3.0
Transformers ≥ 4.36.0
Accelerate ≥ 0.26.0

See pyproject.toml for the full dependency specification.

License

This project is licensed under the Apache License 2.0.

Team

MAGIC GROUP — Beihang University, School of Computer Science and Engineering, ACT Lab.


If you find this toolkit useful in your research, please consider starring the repository ⭐

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_gfm-0.1.11.tar.gz (18.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_gfm-0.1.11-py3-none-any.whl (19.1 MB view details)

Uploaded Python 3

File details

Details for the file python_gfm-0.1.11.tar.gz.

File metadata

  • Download URL: python_gfm-0.1.11.tar.gz
  • Upload date:
  • Size: 18.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for python_gfm-0.1.11.tar.gz
Algorithm Hash digest
SHA256 719396714c7d041d89cee6badf1ddcc4ecf3187d002e904cf9b4dc6825ffa26f
MD5 5bd33766d4e93d3c480a29e86562ff36
BLAKE2b-256 b4cfb3d3528f30ec9ab7c9ef2c5a402d18d191209869e9f6e7c7085063facdc0

See more details on using hashes here.

File details

Details for the file python_gfm-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: python_gfm-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for python_gfm-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 0b46ef7a8234f98c788310a50e6595c2ed49a17a3c8c30215bb9626922f6ee8c
MD5 0a3105f95d718a51e92fc795fd014d11
BLAKE2b-256 a68a3ac6cfad797b4ea3b7247c7a78b31ed3b9060227dd7abb3a941951f5a763

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page