Modern-Python reimplementation of LDSC (LD Score Regression) -- SNP-heritability, genetic correlation, partitioned/stratified h2 and LDSC-SEG cell-type enrichment.
Project description
py-ldsc
py-ldsc (pyldsc) is a faithful, cleanly-rewritten modern-Python
reimplementation of LDSC — LD Score
Regression.
It ports the full LDSC toolkit:
- LD Score Regression — Bulik-Sullivan et al., Nature Genetics 2015, "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies".
- Partitioned heritability / stratified LDSC — Finucane et al., Nature Genetics 2015.
- LDSC-SEG (cell-type / tissue-specific enrichment) — Finucane et al., Nature Genetics 2018.
The original ldsc is a Python-2-era command-line tool. pyldsc is a pure,
importable, modern-Python package (numpy / scipy / pandas / bitarray — no
rpy2) that matches the original numerically. The block jackknife is exact
arithmetic and the weighted regressions are deterministic, so estimates agree
with ldsc to floating-point tolerance.
Installation
pip install pyldsc
From source:
git clone https://github.com/omicverse/py-ldsc
cd py-ldsc
pip install -e .
Requires Python ≥ 3.9 and numpy, scipy, pandas, bitarray.
What it covers
| Capability | Function / class | Original LDSC flag |
|---|---|---|
| LD Score estimation | estimate_ldscore |
--l2 |
| SNP-heritability | estimate_h2, Hsq |
--h2 |
| Partitioned / stratified h2 | partitioned_h2 |
--h2 --overlap-annot |
| Genetic correlation | estimate_rg, RG, Gencov |
--rg |
| LDSC-SEG cell-type enrichment | ldsc_seg |
--h2-cts |
| Munge raw GWAS sumstats | munge_sumstats |
munge_sumstats.py |
| Block jackknife | LstsqJackknifeFast, RatioJackknife |
— |
| IRWLS | IRWLS |
— |
PLINK .bed reader |
PlinkBEDFile |
— |
All the LDSC machinery is reproduced: windowed bias-corrected LD Scores
(--ld-wind-cm/-kb/-snps), the regression weights, the two-step estimator,
chi-square filtering (--chisq-max), block-jackknife standard errors, the
intercept / ratio, constrained intercepts (--intercept-h2,
--intercept-gencov, --no-intercept), the --overlap-annot correction and
per-category enrichment / coefficient z-scores.
Python API
import pyldsc
# 1. Estimate LD Scores from a PLINK reference panel
res = pyldsc.estimate_ldscore("reference", ld_wind_cm=1.0, out="reference")
res["ldscore"] # DataFrame: CHR, SNP, BP, L2
res["M"], res["M_5_50"]
# 2. SNP-heritability from GWAS summary statistics
h2 = pyldsc.estimate_h2("trait.sumstats", ref_ld="baseline", w_ld="weights")
h2.tot, h2.tot_se # heritability +/- jackknife SE
h2.intercept, h2.ratio # LD Score regression intercept and ratio
print(h2.summary())
# 3. Partitioned / stratified heritability (--overlap-annot)
part = pyldsc.partitioned_h2("trait.sumstats", ref_ld="baseline.",
w_ld="weights.", frqfile="frq.")
part.overlap_results # per-category enrichment / coefficient table
# 4. Genetic correlation between traits
rg = pyldsc.estimate_rg(["trait1.sumstats", "trait2.sumstats"],
ref_ld="baseline", w_ld="weights")
rg[0].rg_ratio, rg[0].rg_se, rg[0].p
# 5. LDSC-SEG cell-type / tissue enrichment
seg = pyldsc.ldsc_seg("trait.sumstats", ref_ld_cts="celltypes.ldcts",
ref_ld="baseline.", w_ld="weights.")
seg # DataFrame: Name, Coefficient, Coefficient_std_error, P_value
# 6. Harmonise raw GWAS summary statistics
ss = pyldsc.munge_sumstats("raw_gwas.txt", out="trait", N=100000,
merge_alleles="w_hm3.snplist")
Command-line interface
A thin pyldsc CLI mirrors the original ldsc.py flags:
pyldsc l2 --bfile reference --ld-wind-cm 1 --out reference
pyldsc h2 --h2 trait.sumstats --ref-ld baseline --w-ld weights --out trait
pyldsc rg --rg t1.sumstats,t2.sumstats --ref-ld baseline --w-ld weights
pyldsc h2-cts --h2-cts trait.sumstats --ref-ld-chr-cts celltypes.ldcts \
--ref-ld baseline. --w-ld weights.
pyldsc munge --sumstats raw_gwas.txt --N 100000 --out trait
Numerical parity
tests/test_parity.py drives the test data bundled with the upstream ldsc
repository through pyldsc and asserts numerical agreement:
- deterministic h2 / intercept / rg / jackknife-SE values on the
simulate_testdata reproduce tortol = 1e-6; - the exact-equality identities from the original suite hold
(
test_twostep_h2,test_h2_M,test_rg_M,test_read_annot, PLINK.bedparsing); - the statistical-property checks of
ldsc/test/test_sumstats.py(Test_H2_Statistical) hold — averaged over simulated GWAS the estimator is unbiased: mean h2 ≈ 0.9, mean per-category h2 ≈ (0.3, 0.6), intercept ≈ 1.
pytest tests/ -q
Examples
examples/benchmark.py— timing of the LDSC workflows on the bundled data.examples/compare_reference.ipynb— side-by-side comparison ofpyldscagainst the original LDSC reference values, with a regression-fit plot and an LDSC-SEG cell-type enrichment bar plot.
License
GPL-3.0, matching the upstream ldsc. The original LDSC was written by
Brendan Bulik-Sullivan and Hilary Finucane; pyldsc is an independent
modern-Python port for the omicverse
project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyldsc-0.1.0.tar.gz.
File metadata
- Download URL: pyldsc-0.1.0.tar.gz
- Upload date:
- Size: 75.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0378b7cce76b615af47ebc6ea07bb7e565b739aa6d4288589c9db68dd67ce81
|
|
| MD5 |
b17549ff245b9d1ee8a3a48ad8feea8d
|
|
| BLAKE2b-256 |
268bf8f38e639e1906603f2b884aae94ff24637cb2995a363eb73b18506ab176
|
Provenance
The following attestation bundles were made for pyldsc-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-ldsc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyldsc-0.1.0.tar.gz -
Subject digest:
a0378b7cce76b615af47ebc6ea07bb7e565b739aa6d4288589c9db68dd67ce81 - Sigstore transparency entry: 1582830842
- Sigstore integration time:
-
Permalink:
omicverse/py-ldsc@5efb29eb92bc8b967d2f0d77c69a00772bfbe303 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5efb29eb92bc8b967d2f0d77c69a00772bfbe303 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyldsc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyldsc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 61.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a0379d60dc7d764c605cc694870ef0629c056ef6855f8c3d108394486bd6017
|
|
| MD5 |
d4e4298cb16339e21bd83c656eb3e495
|
|
| BLAKE2b-256 |
e1de4ac8c98efe01cd2b50cd8cec88bd6cfcc80c039562b823f24c5899c4483f
|
Provenance
The following attestation bundles were made for pyldsc-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-ldsc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyldsc-0.1.0-py3-none-any.whl -
Subject digest:
8a0379d60dc7d764c605cc694870ef0629c056ef6855f8c3d108394486bd6017 - Sigstore transparency entry: 1582831582
- Sigstore integration time:
-
Permalink:
omicverse/py-ldsc@5efb29eb92bc8b967d2f0d77c69a00772bfbe303 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5efb29eb92bc8b967d2f0d77c69a00772bfbe303 -
Trigger Event:
workflow_dispatch
-
Statement type: