Skip to main content

Clean pure-Python reimplementation of scDRS — single-cell disease-relevance scoring from GWAS gene sets.

Project description

py-scdrs

pyscdrs is a clean, pure-Python reimplementation of scDRS (single-cell disease-relevance score) for the omicverse project.

scDRS (Zhang*, Hou*, et al. & Price, Nature Genetics 2022, 54:1572-1580) scores individual cells in scRNA-seq data for their relevance to a disease or complex trait, using a polygenic gene set derived from GWAS summary statistics (typically via MAGMA). The score is a covariate-corrected, technical-variance-weighted average of disease-gene expression, calibrated against Monte-Carlo control gene sets matched on the gene-level mean-variance relationship.

This package is a faithful rewrite — with the same random_seed it reproduces the original scdrs package's control gene sets, scores and p-values bit-for-bit on the bundled toy data.

Install

pip install pyscdrs

Pure Python — depends only on numpy, scipy, pandas, anndata, scanpy, scikit-misc, statsmodels and tqdm. No R / rpy2.

Quick start

import anndata, pandas as pd
import pyscdrs as scdrs

# Load size-factor-normalized, log1p-transformed single-cell data
adata  = anndata.read_h5ad("toydata_mouse.h5ad")
df_cov = pd.read_csv("toydata_mouse.cov", sep="\t", index_col=0)

# 1. Preprocess: covariate correction + gene/cell stats + mean-var bins
scdrs.preprocess(adata, cov=df_cov)

# 2. Load a MAGMA-style .gs gene set and score cells
dict_gs = scdrs.load_gs("toydata_mouse.gs")
genes, weights = dict_gs["toydata_gs_mouse"]
df_res = scdrs.score_cell(
    adata, genes, gene_weight=weights, n_ctrl=1000, random_seed=0,
    return_ctrl_norm_score=True,
)
print(df_res[["raw_score", "norm_score", "pval", "zscore"]].head())

# 3. Downstream: cell-group association + heterogeneity
import scanpy as sc
sc.pp.neighbors(adata, n_neighbors=15, n_pcs=20)
dict_group = scdrs.downstream_group_analysis(
    adata, df_res, group_cols=["cell_type"]
)

Public API

Stage Function
Preprocessing preprocess, compute_stats, reg_out, category2dummy
Scoring score_cell
Downstream downstream_group_analysis, downstream_corr_analysis, downstream_gene_analysis, test_gearysc, gearys_c
Gene-set / I/O load_gs, save_gs, munge_gs, load_h5ad, load_homolog_mapping, convert_species_name, zsc2pval, pval2zsc

score_cell options

  • weight_opt — raw-score weighting: uniform, vs (1/sqrt technical variance, default), inv_std (1/std), od (overdispersion score).
  • ctrl_match_key — gene statistic for matching control genes (default mean_var).
  • n_ctrl — number of Monte-Carlo control gene sets (default 1000).
  • random_seed — governs the control gene sets; the same seed gives the same results as the original scdrs.

The output is a per-cell DataFrame with raw_score, norm_score, mc_pval (per-cell Monte-Carlo p-value), pval (pooled empirical p-value), nlog10_pval and zscore.

Command line

pyscdrs compute-score --h5ad-file data.h5ad --gs-file trait.gs \
    --cov-file cov.tsv --out-folder out/ --n-ctrl 1000 --flag-full-score

pyscdrs perform-downstream --h5ad-file data.h5ad \
    --full-score-file out/trait.full_score.gz --out-folder out/ \
    --group-analysis cell_type --gene-analysis

Parity with the original scDRS

tests/test_parity.py runs both pyscdrs and the upstream scdrs package on scDRS's own bundled toy data with identical random_seed and asserts agreement of preprocess (gene stats and mean-variance bins), score_cell (raw_score, norm_score, pval, mc_pval, zscore) and the downstream group / gene / correlation statistics. On the toy data the agreement is bit-exact.

License

MIT, same as the original scDRS. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyscdrs-0.1.0.tar.gz (147.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyscdrs-0.1.0-py3-none-any.whl (143.7 kB view details)

Uploaded Python 3

File details

Details for the file pyscdrs-0.1.0.tar.gz.

File metadata

  • Download URL: pyscdrs-0.1.0.tar.gz
  • Upload date:
  • Size: 147.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyscdrs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 48d92796a25c72d7555f49663de3d124905b60c66fed65a038c7a55a7a19cc44
MD5 e8511d9b835ed0568f240de2b772ee3b
BLAKE2b-256 6f00252deb54f491b370c47abc3862b4971783301ef24140b36ca9cb8a089dc8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyscdrs-0.1.0.tar.gz:

Publisher: publish.yml on omicverse/py-scdrs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyscdrs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyscdrs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 143.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyscdrs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e8418b8a99953ae8e4a0313a64b54d17a3e0ba676838c1208b737d98a990c00e
MD5 ff3e80016a98582ecc6c5551d2188191
BLAKE2b-256 018e0663c28228c81ab7a0a4538e9cbd439eb8568c06c8dc98a235679bdf0039

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyscdrs-0.1.0-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-scdrs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page