Skip to main content

Relational fine-mapping of causal GWAS variants on a multi-omics knowledge graph

Project description

GraphGWAS

License: MIT Python 3.11+ Citation

Relational fine-mapping of causal GWAS variants on a multi-omics knowledge graph

GraphGWAS is a graph-native fine-mapping platform built on Neo4j. It carries multi-omics biological structure — genes, tissue-specific eQTLs, pathways, protein–protein interactions — through the fine-mapping inference as a typed factor graph, rather than collapsing it to flat per-variant annotation priors as existing Bayesian fine-mappers do. This relational prior matches the accuracy of SuSiE / FINEMAP / SuSiE-inf / FINEMAP-inf / SBayesRC at 6–60× the speed under strong signal, and wins 27–2 head-to-head against SuSiE at weak signal with tissue-specific eQTL priors.

Key features

  • Two new fine-mapping algorithms with theoretical guarantees
    • HBP — hierarchical belief propagation on a variant→gene→pathway factor graph with PPI coupling; proved Banach contraction (Theorem 2); 0.02–0.08 s per locus
    • GAFM (Graph-Augmented Fine-Mapping) — LD-deconvolved evidence combined with a graph functional score via adaptive α; proved causal-variant ranking under mild LD-decay assumptions (Theorem 3)
  • Six head-to-head baselines integrated into a common interface — SuSiE, FINEMAP, SuSiE-inf, FINEMAP-inf, PolyFun-proxy, SBayesRC
  • Calibrated PIPs with 0% null false-positive rate across 100 simulations
  • Multi-omics graph — 70.7 M variants, 20,092 GENCODE genes, 43.2 M GTEx v8 tissue eQTLs, 230,850 STRING interactions (combined score ≥ 700), 370,000 ENCODE cCREs
  • Biobank-scale — sumstats-only entry path consumes Pan-UK Biobank summary statistics directly via tabix over HTTPS; demonstrated on 4 ancestries (EUR N = 420,531; CSA, AFR, EAS)
  • Cross-species — same codebase applies to yeast, human, Arabidopsis
  • Unified package with 52-command CLI, 37-endpoint FastAPI server, and 16-tool MCP server for AI-agent access

Quick start

# Install
git clone https://github.com/jfmao/GraphGWAS.git
cd GraphGWAS/src/python && pip install -e '.[all]'

# Run fine-mapping from Pan-UKB summary statistics (no Neo4j required)
python -c "
from graphgwas.panukb import fetch_sumstats_locus
from graphgwas.finemapping_v2 import hbp_finemap_from_sumstats
# Fetch BMI sumstats near FTO (GRCh37)
sumstats = fetch_sumstats_locus(
    phenocode='21001', chr='16',
    start=53720000, end=53920000,
    trait_type='continuous', modifier='irnt',
    ancestries=['EUR', 'CSA', 'AFR', 'EAS'],
)
print({anc: len(s.variants) for anc, s in sumstats.items()})
"

# Full pipeline with Neo4j + multi-omics graph:
# (1) Start Neo4j with the pre-built human dump (17 GB, from Zenodo)
# (2) Run L1 fine-mapping on a lead variant
graphgwas finemap --chr 16 --pos 53820527 --window 100000 \
    --phenotype BMI --method l1 -o credible_set.tsv

The graph schema

 Variant ──HAS_CONSEQUENCE──> Gene ──IN_PATHWAY──> Pathway
    │                           │
    ├── (af, qual, gt_packed)   ├── INTERACTS_WITH (STRING PPI ≥ 700)
    ├── eQTL ─────────────> Gene (tissue-specific, GTEx v8)
    ├── IN_REGULATORY ─────> RegulatoryElement  (ENCODE cCRE)
    └── FOR_VARIANT <─── AssociationResult ──IN_STUDY──> GWASStudy

The credible-set output is itself a graph object: each reported variant is co-queryable with its gene, tissue and pathway neighbours in a single Cypher traversal, eliminating the post-hoc enrichment step that flat-prior pipelines require.

Three interfaces

Interface Use case Entry point
CLI (52 commands, 15 groups) interactive analysis, scripted pipelines graphgwas ...
REST API (FastAPI, 37 endpoints) web integration, programmatic access graphgwas api serve
MCP server (FastMCP, 16 tools) AI-agent access via Claude Desktop / Claude Code graphgwas mcp

Full documentation in docs/manual/; end-to-end walkthrough in vignettes/fine-mapping-quickstart.md.

Fine-mapping methods at a glance

Method Complexity Typical runtime / locus Wins vs SuSiE at
HBP (three-layer factor graph + Banach contraction) O(E × T) 0.02–0.08 s accuracy parity; 6–60× faster
L1 (LD-deconvolved + adaptive α + graph prior) O(n²) 0.07 s 27–2 at weak signal + tissue-specific eQTL priors
CLGF (cross-locus EM) O(L × T) locus-dependent multi-locus shared-pathway evidence
L4 (MDS embedding) O(n² + n d) 0.1 s multi-signal detection

Documentation

Platform scope beyond fine-mapping

GraphGWAS is a platform of which fine-mapping is the first method class rigorously benchmarked (see the accompanying Nature Genetics paper). The codebase additionally implements:

  • Epistasis (M1 LD-pruned, M2 motif-filtered, M3 differential-subgraph, M4 dark-matter pairs) — companion manuscript in preparation
  • Heritability (6 estimators including spectral, GRM-REML, conductance)
  • Multivariate cross-trait analysis (r_G, G-matrix, coherence, pleiotropy)
  • Polygenic risk scores (classical + pathway-weighted)
  • Mendelian randomisation (IVW, Egger, weighted median)
  • Gene–environment interactions (multi-environment trials)
  • Heterogeneous GNN (PyTorch Geometric) and LangGraph AI-agent interface

Honest benchmark-status table in Supplementary Note S3 of the manuscript.

Data

Pre-built Neo4j graph databases on Zenodo (DOIs assigned on acceptance):

Dataset Size Contents
Human 1KG + multi-omics 17 GB 70.7 M variants, 3,202 samples, 20,092 genes, 43.2 M GTEx eQTLs, 230 K STRING PPIs, 370 K ENCODE cCREs
Yeast 1011 Genomes 0.5 GB 1.92 M variants, 1,011 strains, SGD gene annotations, 35 growth-trait phenotypes

Pan-UKB summary statistics are streamed on demand via tabix over HTTPS from the public Amazon S3 bucket pan-ukb-us-east-1; no authentication or bulk download required.

Citation

If you use GraphGWAS, please cite the accompanying Nature Genetics manuscript (Relational biological structure improves fine-mapping of causal GWAS variants under weak signal, submitted 2026) and the Zenodo-versioned software release. See CITATION.cff.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphgwas-0.1.1.tar.gz (182.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphgwas-0.1.1-py3-none-any.whl (197.9 kB view details)

Uploaded Python 3

File details

Details for the file graphgwas-0.1.1.tar.gz.

File metadata

  • Download URL: graphgwas-0.1.1.tar.gz
  • Upload date:
  • Size: 182.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graphgwas-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e5325d80a17cd88a26b2ddd4c65ac7e514767d8333b755fc694012c49fb24348
MD5 5a397dbb0ea9d26dd432c0fdfb19bd8f
BLAKE2b-256 1fb5b1bbe9623e6650d004cbada53e167e0cdf4c262990bd67884210df72d1f6

See more details on using hashes here.

File details

Details for the file graphgwas-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: graphgwas-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 197.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graphgwas-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d59f2f6be38a7be6192097ea5ce4abd4c6d994a5e153661e9fdee886f67ec1ec
MD5 22b56b41a64b7a8e64a0802674520b2f
BLAKE2b-256 e48842cccb114e74887830a499ebc73ee71f206204494a4fab7a5d5605798aeb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page