Single-cell disease-relevance score
Project description
scDRS (single-cell disease-relevance score) is a method for associating individual cells in single-cell RNA-seq data with disease GWASs, built on top of AnnData and Scanpy.
Read the documentation: installation, usage, command-line interface (CLI), file formats, etc.
Check out instructions for making customized gene sets using MAGMA.
Reference
Zhang*, Hou*, et al. "Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data", Nature Genetics, 2022.
Versions
- v1.0.2: bug fixes on
scdrs.util.plot_group_stats
; input checks inscdrs munge-gs
andscdrs.util.load_h5ad
. - v1.0.1: stable version used in publication. Identical to
v1.0.0
except documentation. - v1.0.0: stable version used in revision 1. Results are identical to
v0.1
for binary gene sets. Changes with respect tov0.1
:- scDRS command-line interface (CLI) instead of
.py
scripts for calling scDRS in bash, includingscdrs munge-gs
,scdrs compute-score
, andscdrs perform-downstream
. - More efficient in memory use due to the use of sparse matrix throughout the computation.
- Allow the use of quantitative weights.
- New feature
--adj-prop
for adjusting for cell type-proportions.
- scDRS command-line interface (CLI) instead of
- v0.1: stable version used in the initial submission.
Code and data to reproduce results of the paper
See scDRS_paper for more details (experiments folder is deprecated). Data are at figshare.
- Download GWAS gene sets (.gs files) for 74 diseases and complex traits.
- Download scDRS results (.score.gz and .full_score.gz files) for TMS FACS + 74 diseases/trait.
Older versions
- Initial submission: GWAS gene sets and scDRS results.
Explore scDRS results via cellxgene
- Demo for 3 TMS FACS cell types and 3 diseases/traits.
- Results for 110,096 TMS FACS cells and 74 diseases/traits.
- Download h5ad files for cellxgene.
110,096 cells from 120 cell types in TMS FACS | IBD-associated cells |
scDRS scripts (deprecated)
NOTE: scDRS scripts are still maintained but deprecated. Consider using scDRS command-line interface instead.
scDRS script for score calculation
Input: scRNA-seq data (.h5ad file) and gene set file (.gs file)
Output: scDRS score file ({trait}.score.gz file) and full score file ({trait}.full_score.gz file) for each trait in the .gs file
h5ad_file=your_scrnaseq_data
cov_file=your_covariate_file
gs_file=your_gene_set_file
out_dir=your_output_folder
python compute_score.py \
--h5ad_file ${h5ad_file}.h5ad\
--h5ad_species mouse\
--cov_file ${cov_file}.cov\
--gs_file ${gs_file}.gs\
--gs_species human\
--flag_filter True\
--flag_raw_count True\
--n_ctrl 1000\
--flag_return_ctrl_raw_score False\
--flag_return_ctrl_norm_score True\
--out_folder ${out_dir}
--h5ad_file
(.h5ad file) : scRNA-seq data--h5ad_species
("hsapiens"/"human"/"mmusculus"/"mouse") : species of the scRNA-seq data samples--cov_file
(.cov file) : covariate file (optional, .tsv file, see file format)--gs_file
(.gs file) : gene set file (see file format)--gs_species
("hsapiens"/"human"/"mmusculus"/"mouse") : species for genes in the gene set file--flag_filter
("True"/"False") : if to perform minimum filtering of cells and genes--flag_raw_count
("True"/"False") : if to perform normalization (size-factor + log1p)--n_ctrl
(int) : number of control gene sets (default 1,000)--flag_return_ctrl_raw_score
("True"/"False") : if to return raw control scores--flag_return_ctrl_norm_score
("True"/"False") : if to return normalized control scores--out_folder
: output folder. Score files will be saved as{out_folder}/{trait}.score.gz
(see file format)
scDRS script for downsteam applications
Input: scRNA-seq data (.h5ad file), gene set file (.gs file), and scDRS full score files (.full_score.gz files)
Output: {trait}.scdrs_ct.{cell_type} file (same as the new {trait}.scdrs_group.{cell_type} file) for cell type-level analyses (association and heterogeneity); {trait}.scdrs_var file (same as the new {trait}.scdrs_cell_corr file) for cell variable-disease association; {trait}.scdrs_gene file for disease gene prioritization.
h5ad_file=your_scrnaseq_data
out_dir=your_output_folder
python compute_downstream.py \
--h5ad_file ${h5ad_file}.h5ad \
--score_file @.full_score.gz \
--cell_type cell_type \
--cell_variable causal_variable,non_causal_variable,covariate\
--flag_gene True\
--flag_filter False\
--flag_raw_count False\ # flag_raw_count is set to `False` because the toy data is already log-normalized, set to `True` if your data is not log-normalized
--out_folder ${out_dir}
--h5ad_file
(.h5ad file) : scRNA-seq data--score_file
(.full_score.gz files) : scDRS full score files; supporting use of "@" to match strings--cell_type
(str) : cell type column (supporting multiple columns separated by comma); must be present inadata.obs.columns
; used for cell type-disease association analyses (5% quantile as test statistic) and detecting association heterogeneity within cell type (Geary's C as test statistic)--cell_variable
(str) : cell-level variable columns (supporting multiple columns separated by comma); must be present inadata.obs.columns
; used for cell variable-disease association analyses (Pearson's correlation as test statistic)--flag_gene
("True"/"False") : if to correlate scDRS disease scores with gene expression--flag_filter
("True"/"False") : if to perform minimum filtering of cells and genes--flag_raw_count
("True"/"False") : if to perform normalization (size-factor + log1p)--out_folder
: output folder. Score files will be saved as{out_folder}/{trait}.scdrs_ct.{cell_type}
for cell type-level analyses (association and heterogeneity);{out_folder}/{trait}.scdrs_var
file for cell variable-disease association;{out_folder}/{trait}.scdrs_var.{trait}.scdrs_gene
file for disease gene prioritization. (see file format)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scdrs-1.0.2.tar.gz
.
File metadata
- Download URL: scdrs-1.0.2.tar.gz
- Upload date:
- Size: 743.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b84523636e1545ab9cf4a0deaf728b0dce23e78f60a2d431fb5338b1a3124bc |
|
MD5 | d2222f82568f0b31adff378b73cf511b |
|
BLAKE2b-256 | 301181d7b1708ee3fe2509796ed988a96a3a8b0f42435293150db32d0d3518fd |
Provenance
File details
Details for the file scdrs-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: scdrs-1.0.2-py3-none-any.whl
- Upload date:
- Size: 751.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2df81f3ea177f099e9d1b7735a11c7b53243f9a15983f6dc4eb486b0226f752 |
|
MD5 | da7d7c495c9ddbb1b5b9e2a560985ed4 |
|
BLAKE2b-256 | 0fd7042bf52a0def96fe42e741751bce6ec5d9940442b41ebfade1f2ba44477f |