Pure-Python port of the R package coloc — Bayesian colocalisation of two genetic traits via Approximate Bayes Factors and SuSiE.
Project description
py-coloc
A pure-Python port of the R/CRAN package coloc — Bayesian colocalisation of two genetic traits.
pycoloc tests whether two genetic traits (for example a GWAS trait and
a molecular QTL) share one or more causal variants in a genomic region,
using Approximate Bayes Factors. It is a faithful re-implementation of
coloc 5.2.3 — the posterior probabilities match the R package to machine
precision (< 1e-4 absolute, typically < 1e-15).
- Methods: Giambartolomei et al., PLoS Genet. 2014 (single causal variant); Wallace, PLoS Genet. 2020 / 2021 (SuSiE & conditioning); Wakefield, Genet. Epidemiol. 2009 (Approximate Bayes Factors).
- Dependencies:
numpy,scipy,pandas,matplotlibonly. No R, no rpy2.
Installation
pip install pycoloc
The five posterior hypotheses
coloc_abf returns five posterior probabilities for a region:
| Hypothesis | Meaning |
|---|---|
| PP.H0 | no causal variant for either trait |
| PP.H1 | causal variant for trait 1 only |
| PP.H2 | causal variant for trait 2 only |
| PP.H3 | distinct causal variants for the two traits |
| PP.H4 | a single, shared causal variant (colocalisation) |
Quick start
A dataset is a plain dict of GWAS summary statistics for one trait.
import numpy as np
import pycoloc as cl
# two traits, each with one (shared) causal SNP
rng = np.random.default_rng(0)
n = 500
def make(causal):
beta = rng.normal(0, 0.02, n)
beta[causal] = 0.4
return dict(
snp=[f"s{i}" for i in range(n)],
position=np.arange(n) * 1000,
beta=beta, varbeta=np.full(n, 0.0025),
MAF=rng.uniform(0.05, 0.5, n),
type="quant", N=4000, sdY=1.0,
)
D1, D2 = make(250), make(250)
res = cl.coloc_abf(D1, D2)
print(res["summary"]) # nsnps, PP.H0.abf ... PP.H4.abf
print(res["results"].head()) # per-SNP lABF and SNP.PP.H4
Single-causal-variant colocalisation
res = cl.coloc_abf(D1, D2, p1=1e-4, p2=1e-4, p12=1e-5)
fm = cl.finemap_abf(D1) # single-trait ABF fine-mapping
det = cl.coloc_detail(D1, D2) # also returns the all-pairs H3 grid
Multiple causal variants (SuSiE)
coloc_susie consumes the per-single-effect log-Bayes-factor matrix
lbf_variable (shape L x p) produced by susieR::runsusie — it does
not run SuSiE itself, so there is no SuSiE dependency.
# lbf1, lbf2 : (L x p) arrays from runsusie(); cs_index : credible sets
res = cl.coloc_susie(lbf1, lbf2,
snps1=snp_names, snps2=snp_names,
cs_index1=[1], cs_index2=[1])
print(res["summary"]) # one row per credible-set pair, PP.H0..H4
You may also pass susie-like dicts carrying lbf_variable and sets.
Multiple-signal colocalisation by conditioning / masking
# requires beta, varbeta, MAF, N and an LD matrix in each dataset
res = cl.coloc_signals(D1, D2, method="cond", p12=1e-5) # conditioning
res = cl.coloc_signals(D1, D2, method="mask", p12=1e-5) # masking
Prior-sensitivity analysis
res = cl.coloc_abf(D1, D2)
sens = cl.sensitivity(res, "H4 > 0.5", dataset1=D1, dataset2=D2)
sensitivity sweeps the p12 prior, reports the range over which the
rule holds, and (with doplot=True) draws the four-panel coloc
sensitivity figure.
Plots
cl.plot_coloc_abf(res) # per-SNP |z| coloured by SNP.PP.H4, per trait
cl.plot_dataset(D1) # Manhattan plot of one dataset
cl.manhattan(res) # per-SNP posterior Manhattan
Public API
coloc_abf, coloc_detail, finemap_abf, combine_abf, ColocABF,
coloc_susie, coloc_bf_bf, finemap_susie, finemap_bf,
logbf_to_pp, coloc_signals, finemap_signals, coloc_process,
est_cond, est_all_cond, find_best_signal, map_cond, map_mask,
bin2lin, sensitivity, plot_coloc_abf, plot_dataset, manhattan,
process_dataset, check_dataset, check_alignment, check_ld,
subset_dataset, approx_bf_estimates, approx_bf_p, sdY_est,
logsum, logdiff, Var_data, Var_data_cc, adjust_prior,
prior_snp2hyp, prior_adjust.
R parity
tests/r_reference_driver.R runs coloc 5.2.3 on the bundled
coloc_test_data and exports both the raw inputs and the reference
results; tests/test_r_parity.py then runs pycoloc on the identical
inputs. On coloc_test_data (500 SNPs):
| Function | Agreement vs coloc 5.2.3 |
|---|---|
coloc_abf |
PP.H0..H4 max abs diff < 3e-16; SNP.PP.H4 r = 1 |
finemap_abf |
SNP.PP Pearson r = 1 (quant and case/control) |
coloc_susie |
per-pair PP.H0..H4 max abs diff < 4e-16 |
coloc_signals |
PP.H0..H4 max abs diff < 4e-16 (cond and mask) |
sensitivity |
swept posterior grid max abs diff < 1e-6 |
python -m pytest tests/ -q
License
GPL-3, matching the upstream coloc package
(License: GPL in its DESCRIPTION).
Citing
If you use pycoloc, please cite the original coloc methods papers:
- Giambartolomei C et al. (2014) Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10(5):e1004383.
- Wallace C (2020) Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLoS Genet. 16(4):e1008720.
- Wallace C (2021) A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17(9):e1009440.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycoloc-0.1.0.tar.gz.
File metadata
- Download URL: pycoloc-0.1.0.tar.gz
- Upload date:
- Size: 71.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ab433a2ff0dc92f7a73257472522c7abae5cfb1dff92dcda39902810505ddc7
|
|
| MD5 |
fd4032c2abe88cbbbe12072d01f689c4
|
|
| BLAKE2b-256 |
8f733d8f5c26602363abe8cf1ea293329d2babdfbc6ea16073e09e1fbbc69701
|
Provenance
The following attestation bundles were made for pycoloc-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-coloc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycoloc-0.1.0.tar.gz -
Subject digest:
4ab433a2ff0dc92f7a73257472522c7abae5cfb1dff92dcda39902810505ddc7 - Sigstore transparency entry: 1582437285
- Sigstore integration time:
-
Permalink:
omicverse/py-coloc@8824c394d7eeaaf51e0b948689e8e7e400613478 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8824c394d7eeaaf51e0b948689e8e7e400613478 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pycoloc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pycoloc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05475b2da7124c6b1ca973b65ef5d87baafe5efb85144977c620f61ce20edf6f
|
|
| MD5 |
e969671f009f9b59bad17080c2e21a5c
|
|
| BLAKE2b-256 |
b6f140ae9b8a32fa8969e4b20ea599b46ac416e04572b7c7e0a1b9a5b62279eb
|
Provenance
The following attestation bundles were made for pycoloc-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-coloc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycoloc-0.1.0-py3-none-any.whl -
Subject digest:
05475b2da7124c6b1ca973b65ef5d87baafe5efb85144977c620f61ce20edf6f - Sigstore transparency entry: 1582437376
- Sigstore integration time:
-
Permalink:
omicverse/py-coloc@8824c394d7eeaaf51e0b948689e8e7e400613478 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8824c394d7eeaaf51e0b948689e8e7e400613478 -
Trigger Event:
workflow_dispatch
-
Statement type: