Federated genome-wide association study pipeline built with Flower and PLINK
Project description
Federated GWAS Pipeline
This repository implements a federated pipeline for Genome-Wide Association Studies (GWAS) using Flower, PLINK, and custom privacy-preserving protocols. The pipeline supports multi-stage, multi-client GWAS with reproducible outputs and structured logging.
For release verification steps, see RELEASE.md. For implementation details and change history, see CURRENT_VERSION.md.
Environment Setup
Option 1: UV (recommended)
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
Sync dependencies (Python 3.11+):
uv sync --python 3.11
Optional dev dependencies:
uv sync --dev
Option 2: Conda
conda create -n fedgwas python=3.11 -y
conda activate fedgwas
pip install -e .
pip install -U "flwr[simulation]"
PLINK
- Requires PLINK 1.9+.
- Download the binary for your OS and ensure
plinkis on yourPATH, or set the path in each clientconfig.yaml(plink.pathif configured). - Toy reference files are under
plink/; production runs use experiment data underexperiments/.
Quick Start (Recommended: tiny_even)
The default Flower config in pyproject.toml points to experiments/correctness/tiny_even/configs (2 clients, tiny synthetic data).
Repository layout (experiments)
experiments/correctness/tiny_even/
├── config.yaml
├── configs/
│ ├── server/config.yaml
│ ├── center_1/config.yaml
│ └── center_2/config.yaml
├── data/tiny/
│ ├── center_1/ # PLINK .bed/.bim/.fam per client
│ ├── center_2/
│ └── centralized_baseline/ # after generate_baseline
└── results_2/ # gitignored; current shipped config output
Config templates: configs/config_template.yaml.
1. Generate synthetic data (if not present)
python pipeline/simulation/simulated_data/generate_synthetic_data.py \
--scale tiny \
--partition-strategy even \
--seed 42 \
--output-dir experiments/correctness/tiny_even/data
2. Generate centralized baseline
python experiments/tools/generate_baseline.py \
experiments/correctness/tiny_even/config.yaml
3. Run federated pipeline (simulation)
flwr run . local-simulation --stream
Override rounds or config path:
flwr run . local-simulation --stream --run-config \
'simulation=true num-server-rounds=100 config_path="experiments/correctness/tiny_even/configs"'
Results are written under each client's logs/ and intermediate/ directories (paths set in per-center config.yaml). The shipped tiny configs currently write under experiments/correctness/tiny_even/results_2/; use the paths in the active center and server config files as the source of truth.
4. Retention (optional, automatic)
Experiment config.yaml may set retention.tier (minimal | standard | research). When auto_apply_on_complete: true, the server prunes non-essential artifacts after the run. Manual:
python experiments/tools/apply_run_retention.py \
experiments/correctness/tiny_even/results \
--config-path experiments/correctness/tiny_even/configs \
--dry-run
See RELEASE.md for tier definitions.
5. Evaluate against baseline
python experiments/tools/evaluation/evaluate_all.py \
experiments/correctness/tiny_even/results_2 \
--baseline experiments/correctness/tiny_even/data/tiny/centralized_baseline \
--king
See experiments/correctness/tiny_even/README.md for expected metrics and success criteria. If you changed the output paths in the active configs, pass that results directory instead.
Documentation Site
The Docusaurus site is isolated under website/ and reads Markdown from the repository-level docs/ directory.
cd website
npm install
npm run start
npm run build
Three-Node Cluster Deployment
For Matpool or any 3-node layout (1 SuperLink + 2 SuperNodes), use the bundled scripts and guide:
bash cluster_deployment/scripts/setup-cluster-node.sh # each node
bash cluster_deployment/scripts/cluster-verify-data.sh --scale tiny --client-id 1 # each client
cluster_deployment/scripts/cluster-run-app.sh \
--server-ip <SERVER_IP> --scale tiny --rounds 20
Performance scales (small/medium): experiments/performance/scales.yaml and per-scale READMEs under small_even/, medium_even/.
Local Deployment Mode
Requires SuperLink + two SuperNodes + flwr run:
flower-superlink --insecure
flower-supernode --insecure --superlink 127.0.0.1:9092 --clientappio-api-address 127.0.0.1:9094 \
--node-config 'partition-id=0 num-partitions=2 config-file="experiments/correctness/tiny_even/configs/center_1/config.yaml"'
flower-supernode --insecure --superlink 127.0.0.1:9092 --clientappio-api-address 127.0.0.1:9095 \
--node-config 'partition-id=1 num-partitions=2 config-file="experiments/correctness/tiny_even/configs/center_2/config.yaml"'
flwr run . local-deployment --stream
Advanced: Real-World Experiments
Larger studies (e.g. 1000 Genomes subset) live under experiments/real_world/1000genomes/. These require downloading/preparing data, longer runtime, and overriding config_path:
flwr run . local-simulation --stream --run-config \
'config_path="experiments/real_world/1000genomes/configs"'
Manuscript figures and prior run outputs under experiments/real_world/1000genomes/manuscript/ are research artifacts and are not required for the default release path.
Output and Logs
- Per-client
intermediate_dirandlog_dirare defined in each centerconfig.yaml. - Directories are cleared at the start of each client run to avoid stale artifacts.
- Stage progress and errors go to per-client log files under each configured
output.log_dir. - Inspect PLINK outputs (
.assoc.logistic,.imiss,.frq, KING kinship files) directly under each client'slogs/.
Federated Protocol (Summary)
- Key exchange — ECC public keys via server relay
- Sync — Encrypted seed broadcast (server cannot decrypt)
- Local / global QC — Encrypted QC shares; exclusion list computed client-side
- Iterative KING — Chunked kinship with cross-client anonymized IDs
- Local LR + filtering — Tokenized insignificant SNPs
- Iterative LR — Chunked association on filtered data
Full stage contracts and privacy model: CURRENT_VERSION.md.
Troubleshooting
- PLINK not found — Install PLINK 1.9+ and verify
plinkis onPATHor configured inconfig.yaml. - Wrong config — Check
config_pathinpyproject.tomlor pass--run-config. - Empty results — Ensure data and baseline exist under
experiments/correctness/tiny_even/data/. - Reproducibility — Use fixed seeds in data generation and consistent
config_pathacross runs.
Contributing
Open issues or pull requests for bug fixes, improvements, or new features.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fedgwas-0.3.1.tar.gz.
File metadata
- Download URL: fedgwas-0.3.1.tar.gz
- Upload date:
- Size: 32.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a124160dabf861692024d0fad8437fcbd1b7b2b2912d586afce08bec1357bb21
|
|
| MD5 |
2077320db46e68198613bd7cdabb7228
|
|
| BLAKE2b-256 |
c3eb204d6480abe6c55e9d623484ccf3a7bbc3cbd105425f98ce6a88c8247444
|
Provenance
The following attestation bundles were made for fedgwas-0.3.1.tar.gz:
Publisher:
publish-pypi.yml on sitaomin1994/FedGWAS_pipeline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fedgwas-0.3.1.tar.gz -
Subject digest:
a124160dabf861692024d0fad8437fcbd1b7b2b2912d586afce08bec1357bb21 - Sigstore transparency entry: 1649803476
- Sigstore integration time:
-
Permalink:
sitaomin1994/FedGWAS_pipeline@35800113f80879938625328a9df2051e758c8195 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/sitaomin1994
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@35800113f80879938625328a9df2051e758c8195 -
Trigger Event:
release
-
Statement type:
File details
Details for the file fedgwas-0.3.1-py3-none-any.whl.
File metadata
- Download URL: fedgwas-0.3.1-py3-none-any.whl
- Upload date:
- Size: 95.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5c3da2b2967ba1b9423a1442fa3225e0bd29b069689577be39205861e59b4af
|
|
| MD5 |
4efe33eee6ecdb6c1ddce30ea99cbef4
|
|
| BLAKE2b-256 |
c411ea0b9010f912a433c85e500d6db00e48c0076df44b01097e318c8426be49
|
Provenance
The following attestation bundles were made for fedgwas-0.3.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on sitaomin1994/FedGWAS_pipeline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fedgwas-0.3.1-py3-none-any.whl -
Subject digest:
f5c3da2b2967ba1b9423a1442fa3225e0bd29b069689577be39205861e59b4af - Sigstore transparency entry: 1649803571
- Sigstore integration time:
-
Permalink:
sitaomin1994/FedGWAS_pipeline@35800113f80879938625328a9df2051e758c8195 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/sitaomin1994
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@35800113f80879938625328a9df2051e758c8195 -
Trigger Event:
release
-
Statement type: