Skip to main content

Relational fine-mapping of causal GWAS variants on a multi-omics knowledge graph

Project description

GraphGWAS

License: MIT Python 3.11+ Citation

Relational fine-mapping of causal GWAS variants on a multi-omics knowledge graph

GraphGWAS is a graph-native fine-mapping platform built on Neo4j. It carries multi-omics biological structure — genes, tissue-specific eQTLs, pathways, protein–protein interactions — through the fine-mapping inference as a typed factor graph, rather than collapsing it to flat per-variant annotation priors as existing Bayesian fine-mappers do. This relational prior matches the accuracy of SuSiE / FINEMAP / SuSiE-inf / FINEMAP-inf / SBayesRC at 6–60× the speed under strong signal, and wins 27–2 head-to-head against SuSiE at weak signal with tissue-specific eQTL priors.

Key features

  • Two new fine-mapping algorithms with theoretical guarantees
    • HBP — hierarchical belief propagation on a variant→gene→pathway factor graph with PPI coupling; proved Banach contraction (Theorem 2); 0.02–0.08 s per locus
    • GAFM (Graph-Augmented Fine-Mapping) — LD-deconvolved evidence combined with a graph functional score via adaptive α; proved causal-variant ranking under mild LD-decay assumptions (Theorem 3)
  • Six head-to-head baselines integrated into a common interface — SuSiE, FINEMAP, SuSiE-inf, FINEMAP-inf, PolyFun-proxy, SBayesRC
  • Calibrated PIPs with 0% null false-positive rate across 100 simulations
  • Multi-omics graph — 70.7 M variants, 20,092 GENCODE genes, 43.2 M GTEx v8 tissue eQTLs, 230,850 STRING interactions (combined score ≥ 700), 370,000 ENCODE cCREs
  • Biobank-scale — sumstats-only entry path consumes Pan-UK Biobank summary statistics directly via tabix over HTTPS; demonstrated on 4 ancestries (EUR N = 420,531; CSA, AFR, EAS)
  • Cross-species — same codebase applies to yeast, human, Arabidopsis
  • Unified package with 52-command CLI, 37-endpoint FastAPI server, and 16-tool MCP server for AI-agent access

Quick start

# Install
git clone https://github.com/jfmao/GraphGWAS.git
cd GraphGWAS/src/python && pip install -e '.[all]'

# Run fine-mapping from Pan-UKB summary statistics (no Neo4j required)
python -c "
from graphgwas.panukb import fetch_sumstats_locus
from graphgwas.finemapping_v2 import hbp_finemap_from_sumstats
# Fetch BMI sumstats near FTO (GRCh37)
sumstats = fetch_sumstats_locus(
    phenocode='21001', chr='16',
    start=53720000, end=53920000,
    trait_type='continuous', modifier='irnt',
    ancestries=['EUR', 'CSA', 'AFR', 'EAS'],
)
print({anc: len(s.variants) for anc, s in sumstats.items()})
"

# Full pipeline with Neo4j + multi-omics graph:
# (1) Start Neo4j with the pre-built human dump (17 GB, from Zenodo)
# (2) Run GAFM fine-mapping on a lead variant
graphgwas finemap --chr 16 --pos 53820527 --window 100000 \
    --phenotype BMI --method l1 -o credible_set.tsv

The graph schema

 Variant ──HAS_CONSEQUENCE──> Gene ──IN_PATHWAY──> Pathway
    │                           │
    ├── (af, qual, gt_packed)   ├── INTERACTS_WITH (STRING PPI ≥ 700)
    ├── eQTL ─────────────> Gene (tissue-specific, GTEx v8)
    ├── IN_REGULATORY ─────> RegulatoryElement  (ENCODE cCRE)
    └── FOR_VARIANT <─── AssociationResult ──IN_STUDY──> GWASStudy

The credible-set output is itself a graph object: each reported variant is co-queryable with its gene, tissue and pathway neighbours in a single Cypher traversal, eliminating the post-hoc enrichment step that flat-prior pipelines require.

Three interfaces

Interface Use case Entry point
CLI (52 commands, 15 groups) interactive analysis, scripted pipelines graphgwas ...
REST API (FastAPI, 37 endpoints) web integration, programmatic access graphgwas api serve
MCP server (FastMCP, 16 tools) AI-agent access via any MCP-compatible client graphgwas mcp

Full documentation in docs/manual/; end-to-end walkthrough in vignettes/fine-mapping-quickstart.md.

Fine-mapping methods at a glance

Method Complexity Typical runtime / locus Wins vs SuSiE at
HBP (three-layer factor graph + Banach contraction) O(E × T) 0.02–0.08 s accuracy parity; 6–60× faster
GAFM (LD-deconvolved + adaptive α + graph prior) O(n²) 0.07 s 27–2 at weak signal + tissue-specific eQTL priors
CLGF (cross-locus EM) O(L × T) locus-dependent multi-locus shared-pathway evidence
L4 (MDS embedding) O(n² + n d) 0.1 s multi-signal detection

Documentation

Platform scope beyond fine-mapping

GraphGWAS is a platform of which fine-mapping is the first method class rigorously benchmarked (see the accompanying Nature Genetics paper). The codebase additionally implements:

  • Epistasis (M1 LD-pruned, M2 motif-filtered, M3 differential-subgraph, M4 dark-matter pairs) — companion manuscript in preparation
  • Heritability (6 estimators including spectral, GRM-REML, conductance)
  • Multivariate cross-trait analysis (r_G, G-matrix, coherence, pleiotropy)
  • Polygenic risk scores (classical + pathway-weighted)
  • Mendelian randomisation (IVW, Egger, weighted median)
  • Gene–environment interactions (multi-environment trials)
  • Heterogeneous GNN (PyTorch Geometric) and LangGraph AI-agent interface

Honest benchmark-status table in Supplementary Note S3 of the manuscript.

Data

Pre-built Neo4j graph databases on Zenodo (DOIs assigned on acceptance):

Dataset Size Contents
Human 1KG + multi-omics 17 GB 70.7 M variants, 3,202 samples, 20,092 genes, 43.2 M GTEx eQTLs, 230 K STRING PPIs, 370 K ENCODE cCREs
Yeast 1011 Genomes 0.5 GB 1.92 M variants, 1,011 strains, SGD gene annotations, 35 growth-trait phenotypes

Pan-UKB summary statistics are streamed on demand via tabix over HTTPS from the public Amazon S3 bucket pan-ukb-us-east-1; no authentication or bulk download required.

Citation

If you use GraphGWAS, please cite the accompanying Nature Genetics manuscript (Relational biological structure improves fine-mapping of causal GWAS variants under weak signal, submitted 2026) and the Zenodo-versioned software release. See CITATION.cff.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphgwas-0.1.3.tar.gz (178.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphgwas-0.1.3-py3-none-any.whl (194.0 kB view details)

Uploaded Python 3

File details

Details for the file graphgwas-0.1.3.tar.gz.

File metadata

  • Download URL: graphgwas-0.1.3.tar.gz
  • Upload date:
  • Size: 178.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graphgwas-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8918137453e3a8c7865395d4c00429b108c614e5c9fa83831d1037a9f9985ed9
MD5 cbb1c5c5c71f8b33b7ccf025a6f16998
BLAKE2b-256 67c20b2bda0133f433539a1b3800c299d365e71e9f826cae1555978e124bca91

See more details on using hashes here.

File details

Details for the file graphgwas-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: graphgwas-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 194.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graphgwas-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 11de36e14ecbbbb4f9db4b86d4e9b08422ad42acc24a3f37b8f8c17708825901
MD5 741185a7aa5672b9c12e250c506633d7
BLAKE2b-256 ce9c1a04cfa4382810afaff513815718361d7375f2202edccff84c998fc174f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page