Single-cell RNA Binding Protein Regulon Inference
Project description
scRBP — Single-cell RNA Binding Protein regulon inference
scRBP is a command-line toolkit for comprehensive analysis of RNA-binding proteins (RBPs) in single-cell RNA-seq data. scRBP provides a systematic, scalable and integrative framework to infer RBP-mediated gene and isoform regulatory networks (“regulons”) from single-cell transcriptomes and prioritize networks underlying complex genetic traits and disorders. scRBP is comprised of six main modules: (i) developing a comprehensive compendium of RBPs and their associated motif clusters from diverse public resources; (ii) systematic, motif-guided transcriptome-wide inference of RBP targets at both gene- and isoform-level resolution; (iii) construction of RBP-gene and/or RBP-isoform co-expression networks from short- or long-read single-cell transcriptomic data, respectively; (iv) defining high-fidelity regulons by integrating RBP-target interactions, and quantifying cell type-specific regulon activity scores (RAS); (v) integrating GWAS results to compute regulon-level genetic association scores (RGS); and (vi) constructing a unified trait-relevance score (TRS) by combining RAS and RGS for each regulon in a given cellular context, with statistical significance assessed using Monte Carlo (MC) sampling.
What scRBP Does
RBPs are key post-transcriptional regulators that control mRNA splicing, stability, and translation. scRBP enables you to:
- Construct which RBPs regulate which genes or isoforms in your single-cell data
- Prune raw RBP–gene associations using motif-binding evidence to obtain high-confidence regulons
- Score each cell or cell type for regulon activity score (RAS) using the AUCell algorithm
- Link RBP regulons to human disease through GWAS genetic enrichment (RGS via MAGMA)
- Integrate RAS and RGS into a unified Trait Relevance Score (TRS) that ranks disease-relevant RBPs
Pipeline at a Glance
Raw single-cell data (.h5ad / .feather)
│
▼
[Step 1] scRBP getSketch ── Stratified GeoSketch cell downsampling
│
▼
[Step 2] scRBP getGRN ── GRNBoost2/GENIE3 RBP→Gene/Isoform inference
│ (run N seeds for robustness, default 30 times)
▼
[Step 3] scRBP getMerge_GRN ── Merge N-seed GRNs → consensus network
│
▼
[Step 4] scRBP getModule ── Extract regulon candidates (Top-N / percentile)
│
▼
[Step 5] scRBP getPrune ── Motif-enrichment pruning via ctxcore
│
▼
[Step 6] scRBP getRegulon ── Export pruned regulons to GMT format
│
▼
[Step 7] scRBP mergeRegulons ── Merge region-specific GMT files
│ (3'UTR / 5'UTR / CDS / Introns)
▼
[Step 8] scRBP ras ── Regulon Activity Score (AUCell) per cell / cell type
│
▼
[Step 9] scRBP rgs ── Regulon Gene-Set analysis (MAGMA GWAS enrichment)
│
▼
[Step 10] scRBP trs ── Trait Relevance Score (RAS × RGS integration)
Installation
Requirements
- Python 3.9, 3.10, or 3.11 (Python 3.12+ not yet supported by
pyscenic/arboreto) - MAGMA binary (external, required only for Step 9 —
scRBP rgs)
Option 1 — Install from PyPI (recommended)
pip install scRBP
This installs scRBP together with all Python dependencies in one step.
Option 2 — Install from source (development)
git clone https://github.com/mayunlong89/scRBP.git
cd scRBP/scRBP_package
pip install -e .
Option 3 — Install via conda (recommended for HPC / cluster)
git clone https://github.com/mayunlong89/scRBP.git
cd scRBP/scRBP_package
conda env create -f environment.yml
conda activate scrbp
pip install -e .
Install MAGMA (for Step 9 only)
MAGMA is a standalone binary not available on PyPI. Download from https://cncr.nl/research/magma and make it executable:
# Linux example
wget https://cncr.nl/research/magma/software/magma_v1.10_static_linux.zip
unzip magma_v1.10_static_linux.zip -d ~/tools/magma/
chmod +x ~/tools/magma/magma
Verify installation
scRBP --help
scRBP getGRN --help
Quick Start
Step 1 — Downsample cells with GeoSketch
Large single-cell datasets (>500K cells) should be downsampled before GRN inference. scRBP uses GeoSketch to retain transcriptional diversity while reducing cell count.
scRBP getSketch \
--input PBMC_full.h5ad \
--output PBMC_sketch_15K.feather \
--n_cells 15000 \
--celltype_col celltype \
--min_cells_per_type 500 \
--n_pca 100 \
--seed 42
Step 2 — Infer gene regulatory networks (GRN)
Run GRNBoost2 with multiple random seeds. Each seed produces an independent GRN. Later, these are merged into a consensus network.
For this step, user can run 'getGRN' based on 30 random seeds for robustness.
Gene mode (RBP → Gene):
for SEED in $(seq 1 30); do
scRBP getGRN \
--matrix PBMC_sketch_15K.feather \
--rbp_list human_RBP_list.txt \
--output grn_seed${SEED} \
--mode gene \
--method grnboost2 \
--n_workers 20 \
--correlation True \
--seed ${SEED}
done
# Output: grn_seed1_scRBP_gene_GRNs.tsv, grn_seed2_scRBP_gene_GRNs.tsv, ...
Isoform mode (RBP → Isoform, requires isoform annotation):
scRBP getGRN \
--matrix PBMC_isoform.feather \
--rbp_list human_RBP_list.txt \
--output iso_grn_seed1 \
--mode isoform \
--isoform_annotation gencode_v44_isoform_gene_map.tsv \
--rbp_agg_method sum \
--remove_self_targets True \
--min_target_cells_expressed 10 \
--min_target_mean_expr 0.01 \
--method grnboost2 \
--n_workers 20 \
--seed 1
# Output: iso_grn_seed1_scRBP_isoform_GRNs.tsv (+ 4 auxiliary files)
Step 3 — Merge GRN seeds into a consensus network
scRBP getMerge_GRN \
--pattern "grn_seed*_scRBP_gene_GRNs.tsv" \
--output grn_consensus.tsv \
--n_present 15 \
--present_rate 0.5
Edges appearing in fewer than 50% of seeds are discarded, yielding a stable consensus network.
Step 4 — Extract regulon candidate modules
scRBP getModule \
--input grn_consensus.tsv \
--output_merged modules.tsv \
--importance_threshold 0.005 \
--top_n_list "5,10,50" \
--target_top_n "50" \
--percentile "0.75,0.9"
Step 5 — Prune with motif-binding evidence
scRBP getPrune \
--rbp_targets modules.tsv \
--motif_rbp_links motif2rbp.csv \
--motif_target_ranks rankings.feather \
--save_dir ./pruned/ \
--rank_threshold 1500
Step 6 — Export regulons to GMT format
scRBP getRegulon \
--input pruned/ctx_scores.csv \
--out-symbol regulons_symbol.gmt \
--out-entrez regulons_entrez.gmt \
--map-custom NCBI38.gene.loc \
--min_genes 5
Step 7 — Merge region-specific GMT files
scRBP mergeRegulons \
--base_dir ./analysis/ \
--input regulons_symbol.gmt \
--output regulons_combined.gmt \
--recursive
Step 8 — Compute Regulon Activity Scores (RAS)
Uses the AUCell algorithm to score each cell or cell type for regulon activity. Also computes the Jensen–Shannon divergence-based Regulon Specificity Score (RSS).
scRBP ras \
--mode ct \
--matrix PBMC_sketch_15K.feather \
--regulons regulons_symbol.gmt \
--out ras_output/ \
--celltypes-csv cell_to_celltype.csv
Step 9 — Regulon genetic association score (RGS)
Links each regulon to GWAS traits using MAGMA gene-set analysis with a 4D null distribution for empirical p-values.
scRBP rgs \
--mode ct \
--magma ~/tools/magma/magma \
--genes-raw gwas.genes.raw \
--sets regulons_entrez.gmt \
--id-type entrez \
--out rgs_output/rgs
Step 10 — Compute Trait relevance Score (TRS)
Integrates RAS and RGS into a unified score:
TRS = norm(RAS) + norm(RGS) − λ × |norm(RAS) − norm(RGS)|
RBPs with high TRS are both activity-high in the cell type and genetically linked to the trait.
scRBP trs \
--mode ct \
--ras ras_output/aucell_ct.csv \
--rgs_csv rgs_output/rgs_real.csv \
--out_prefix trs_output/trs
Command Reference
| Step | Command | Key Inputs | Key Output |
|---|---|---|---|
| 1 | scRBP getSketch |
.h5ad / .feather |
Downsampled cells |
| 2 | scRBP getGRN |
Expression matrix, RBP list | *_scRBP_gene_GRNs.tsv or *_scRBP_isoform_GRNs.tsv |
| 3 | scRBP getMerge_GRN |
Multiple GRN TSV files (glob) | Consensus GRN TSV |
| 4 | scRBP getModule |
Consensus GRN TSV | Modules TSV |
| 5 | scRBP getPrune |
Modules TSV, motif files | Pruned scores (CSV) |
| 6 | scRBP getRegulon |
Pruned scores | Regulons GMT (symbol + Entrez) |
| 7 | scRBP mergeRegulons |
Multiple GMT files | Merged GMT |
| 8 | scRBP ras |
Expression matrix, GMT | AUCell scores, RSS matrix |
| 9 | scRBP rgs |
MAGMA .genes.raw, GMT |
RGS scores CSV |
| 10 | scRBP trs |
RAS CSV, RGS CSV | TRS scores CSV |
Use scRBP <command> --help to see all parameters for any step.
Dependencies
| Category | Packages |
|---|---|
| Core numerics | numpy, pandas, scipy, scikit-learn |
| Single-cell I/O | anndata, scanpy, loompy |
| Fast I/O | polars, pyarrow |
| Cell downsampling | geosketch |
| GRN inference | arboreto (GRNBoost2 / GENIE3) |
| Motif enrichment | ctxcore, pyscenic |
| Progress display | tqdm |
| GWAS enrichment | MAGMA binary (external, user-provided) |
Citation
If you use scRBP in your research, please cite:
Ma Y. et al. Decoding disease-associated RNA-binding protein-mediated regulatory networks through polygenic enrichment across diverse cellular contexts. (2026)
License
MIT License. See LICENSE for details.
Links
- GitHub: https://github.com/mayunlong89/scRBP
- Issues: https://github.com/mayunlong89/scRBP/issues
- Full documentation: see
scRBP_readme.mdin the repository
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrbp-0.1.0.tar.gz.
File metadata
- Download URL: scrbp-0.1.0.tar.gz
- Upload date:
- Size: 62.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f15ac872a193066c326ee5e30adcb254aa3669bb2ff4cb56924fe83930826d9b
|
|
| MD5 |
6a0c4de621e0ec6cc889e4091a6f4a90
|
|
| BLAKE2b-256 |
f3652af16b59553866b57e044ad9f1ec325c0ed5973ee057a3182f1fee3b4410
|
File details
Details for the file scrbp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrbp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 71.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5c273f3ab4be2439e98ec5ee4f088c120385156c9152ca7f1de98fa0baf8fa5
|
|
| MD5 |
403b918a893a770faedbf726c3353c75
|
|
| BLAKE2b-256 |
17337bc4b1a126004509a88fc8d880a22dcfafafc0f0d0a11b13a4b8b1302fde
|