Skip to main content

Modern-Python reimplementation of LDSC (LD Score Regression) -- SNP-heritability, genetic correlation, partitioned/stratified h2 and LDSC-SEG cell-type enrichment.

Project description

py-ldsc

py-ldsc (pyldsc) is a faithful, cleanly-rewritten modern-Python reimplementation of LDSC — LD Score Regression.

It ports the full LDSC toolkit:

  • LD Score Regression — Bulik-Sullivan et al., Nature Genetics 2015, "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies".
  • Partitioned heritability / stratified LDSC — Finucane et al., Nature Genetics 2015.
  • LDSC-SEG (cell-type / tissue-specific enrichment) — Finucane et al., Nature Genetics 2018.

The original ldsc is a Python-2-era command-line tool. pyldsc is a pure, importable, modern-Python package (numpy / scipy / pandas / bitarray — no rpy2) that matches the original numerically. The block jackknife is exact arithmetic and the weighted regressions are deterministic, so estimates agree with ldsc to floating-point tolerance.

Installation

pip install pyldsc

From source:

git clone https://github.com/omicverse/py-ldsc
cd py-ldsc
pip install -e .

Requires Python ≥ 3.9 and numpy, scipy, pandas, bitarray.

What it covers

Capability Function / class Original LDSC flag
LD Score estimation estimate_ldscore --l2
SNP-heritability estimate_h2, Hsq --h2
Partitioned / stratified h2 partitioned_h2 --h2 --overlap-annot
Genetic correlation estimate_rg, RG, Gencov --rg
LDSC-SEG cell-type enrichment ldsc_seg --h2-cts
Munge raw GWAS sumstats munge_sumstats munge_sumstats.py
Block jackknife LstsqJackknifeFast, RatioJackknife
IRWLS IRWLS
PLINK .bed reader PlinkBEDFile

All the LDSC machinery is reproduced: windowed bias-corrected LD Scores (--ld-wind-cm/-kb/-snps), the regression weights, the two-step estimator, chi-square filtering (--chisq-max), block-jackknife standard errors, the intercept / ratio, constrained intercepts (--intercept-h2, --intercept-gencov, --no-intercept), the --overlap-annot correction and per-category enrichment / coefficient z-scores.

Python API

import pyldsc

# 1. Estimate LD Scores from a PLINK reference panel
res = pyldsc.estimate_ldscore("reference", ld_wind_cm=1.0, out="reference")
res["ldscore"]   # DataFrame: CHR, SNP, BP, L2
res["M"], res["M_5_50"]

# 2. SNP-heritability from GWAS summary statistics
h2 = pyldsc.estimate_h2("trait.sumstats", ref_ld="baseline", w_ld="weights")
h2.tot, h2.tot_se          # heritability +/- jackknife SE
h2.intercept, h2.ratio     # LD Score regression intercept and ratio
print(h2.summary())

# 3. Partitioned / stratified heritability (--overlap-annot)
part = pyldsc.partitioned_h2("trait.sumstats", ref_ld="baseline.",
                             w_ld="weights.", frqfile="frq.")
part.overlap_results       # per-category enrichment / coefficient table

# 4. Genetic correlation between traits
rg = pyldsc.estimate_rg(["trait1.sumstats", "trait2.sumstats"],
                        ref_ld="baseline", w_ld="weights")
rg[0].rg_ratio, rg[0].rg_se, rg[0].p

# 5. LDSC-SEG cell-type / tissue enrichment
seg = pyldsc.ldsc_seg("trait.sumstats", ref_ld_cts="celltypes.ldcts",
                      ref_ld="baseline.", w_ld="weights.")
seg     # DataFrame: Name, Coefficient, Coefficient_std_error, P_value

# 6. Harmonise raw GWAS summary statistics
ss = pyldsc.munge_sumstats("raw_gwas.txt", out="trait", N=100000,
                           merge_alleles="w_hm3.snplist")

Command-line interface

A thin pyldsc CLI mirrors the original ldsc.py flags:

pyldsc l2     --bfile reference --ld-wind-cm 1 --out reference
pyldsc h2     --h2 trait.sumstats --ref-ld baseline --w-ld weights --out trait
pyldsc rg     --rg t1.sumstats,t2.sumstats --ref-ld baseline --w-ld weights
pyldsc h2-cts --h2-cts trait.sumstats --ref-ld-chr-cts celltypes.ldcts \
              --ref-ld baseline. --w-ld weights.
pyldsc munge  --sumstats raw_gwas.txt --N 100000 --out trait

Numerical parity

tests/test_parity.py drives the test data bundled with the upstream ldsc repository through pyldsc and asserts numerical agreement:

  • deterministic h2 / intercept / rg / jackknife-SE values on the simulate_test data reproduce to rtol = 1e-6;
  • the exact-equality identities from the original suite hold (test_twostep_h2, test_h2_M, test_rg_M, test_read_annot, PLINK .bed parsing);
  • the statistical-property checks of ldsc/test/test_sumstats.py (Test_H2_Statistical) hold — averaged over simulated GWAS the estimator is unbiased: mean h2 ≈ 0.9, mean per-category h2 ≈ (0.3, 0.6), intercept ≈ 1.
pytest tests/ -q

Examples

  • examples/benchmark.py — timing of the LDSC workflows on the bundled data.
  • examples/compare_reference.ipynb — side-by-side comparison of pyldsc against the original LDSC reference values, with a regression-fit plot and an LDSC-SEG cell-type enrichment bar plot.

License

GPL-3.0, matching the upstream ldsc. The original LDSC was written by Brendan Bulik-Sullivan and Hilary Finucane; pyldsc is an independent modern-Python port for the omicverse project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyldsc-0.1.0.tar.gz (75.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyldsc-0.1.0-py3-none-any.whl (61.0 kB view details)

Uploaded Python 3

File details

Details for the file pyldsc-0.1.0.tar.gz.

File metadata

  • Download URL: pyldsc-0.1.0.tar.gz
  • Upload date:
  • Size: 75.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyldsc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a0378b7cce76b615af47ebc6ea07bb7e565b739aa6d4288589c9db68dd67ce81
MD5 b17549ff245b9d1ee8a3a48ad8feea8d
BLAKE2b-256 268bf8f38e639e1906603f2b884aae94ff24637cb2995a363eb73b18506ab176

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyldsc-0.1.0.tar.gz:

Publisher: publish.yml on omicverse/py-ldsc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyldsc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyldsc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 61.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyldsc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a0379d60dc7d764c605cc694870ef0629c056ef6855f8c3d108394486bd6017
MD5 d4e4298cb16339e21bd83c656eb3e495
BLAKE2b-256 e1de4ac8c98efe01cd2b50cd8cec88bd6cfcc80c039562b823f24c5899c4483f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyldsc-0.1.0-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-ldsc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page