Skip to main content

Single-cell RNA Binding Protein Regulon Inference

Project description

scRBP — Single-cell RNA Binding Protein regulon inference

scRBP is a command-line toolkit for comprehensive analysis of RNA-binding proteins (RBPs) in single-cell RNA-seq data. scRBP provides a systematic, scalable and integrative framework to infer RBP-mediated gene and isoform regulatory networks (“regulons”) from single-cell transcriptomes and prioritize networks underlying complex genetic traits and disorders. scRBP is comprised of six main modules: (i) developing a comprehensive compendium of RBPs and their associated motif clusters from diverse public resources; (ii) systematic, motif-guided transcriptome-wide inference of RBP targets at both gene- and isoform-level resolution; (iii) construction of RBP-gene and/or RBP-isoform co-expression networks from short- or long-read single-cell transcriptomic data, respectively; (iv) defining high-fidelity regulons by integrating RBP-target interactions, and quantifying cell type-specific regulon activity scores (RAS); (v) integrating GWAS results to compute regulon-level genetic association scores (RGS); and (vi) constructing a unified trait-relevance score (TRS) by combining RAS and RGS for each regulon in a given cellular context, with statistical significance assessed using Monte Carlo (MC) sampling.


What scRBP Does

RBPs are key post-transcriptional regulators that control mRNA splicing, stability, and translation. scRBP enables you to:

  • Construct which RBPs regulate which genes or isoforms in your single-cell data
  • Prune raw RBP–gene associations using motif-binding evidence to obtain high-confidence regulons
  • Score each cell or cell type for regulon activity score (RAS) using the AUCell algorithm
  • Link RBP regulons to human disease through GWAS genetic enrichment (RGS via MAGMA)
  • Integrate RAS and RGS into a unified Trait Relevance Score (TRS) that ranks disease-relevant RBPs

Pipeline at a Glance

Raw single-cell data (.h5ad / .feather)
          │
          ▼
[Step 1]  scRBP getSketch        ── Stratified GeoSketch cell downsampling
          │
          ▼
[Step 2]  scRBP getGRN           ── GRNBoost2/GENIE3 RBP→Gene/Isoform inference
          │                          (run N seeds for robustness, default 30 times)
          ▼
[Step 3]  scRBP getMerge_GRN     ── Merge N-seed GRNs → consensus network
          │
          ▼
[Step 4]  scRBP getModule        ── Extract regulon candidates (Top-N / percentile)
          │
          ▼
[Step 5]  scRBP getPrune         ── Motif-enrichment pruning via ctxcore
          │
          ▼
[Step 6]  scRBP getRegulon       ── Export pruned regulons to GMT format
          │
          ▼
[Step 7]  scRBP mergeRegulons    ── Merge region-specific GMT files
          │                          (3'UTR / 5'UTR / CDS / Introns)
          ▼
[Step 8]  scRBP ras              ── Regulon Activity Score (AUCell) per cell / cell type
          │
          ▼
[Step 9]  scRBP rgs              ── Regulon Gene-Set analysis (MAGMA GWAS enrichment)
          │
          ▼
[Step 10] scRBP trs              ── Trait Relevance Score (RAS × RGS integration)

Installation

Requirements

  • Python 3.9, 3.10, or 3.11 (Python 3.12+ not yet supported by pyscenic/arboreto)
  • MAGMA binary (external, required only for Step 9 — scRBP rgs)

Option 1 — Install from PyPI (recommended)

pip install scRBP

This installs scRBP together with all Python dependencies in one step.

Option 2 — Install from source (development)

git clone https://github.com/mayunlong89/scRBP.git
cd scRBP/scRBP_package
pip install -e .

Option 3 — Install via conda (recommended for HPC / cluster)

git clone https://github.com/mayunlong89/scRBP.git
cd scRBP/scRBP_package

conda env create -f environment.yml
conda activate scrbp

pip install -e .

Install MAGMA (for Step 9 only)

MAGMA is a standalone binary not available on PyPI. Download from https://cncr.nl/research/magma and make it executable:

# Linux example
wget https://cncr.nl/research/magma/software/magma_v1.10_static_linux.zip
unzip magma_v1.10_static_linux.zip -d ~/tools/magma/
chmod +x ~/tools/magma/magma

Verify installation

scRBP --help
scRBP getGRN --help

Quick Start

Step 1 — Downsample cells with GeoSketch

Large single-cell datasets (>500K cells) should be downsampled before GRN inference. scRBP uses GeoSketch to retain transcriptional diversity while reducing cell count.

scRBP getSketch \
    --input  PBMC_full.h5ad \
    --output PBMC_sketch_15K.feather \
    --n_cells 15000 \
    --celltype_col celltype \
    --min_cells_per_type 500 \
    --n_pca 100 \
    --seed 42

Step 2 — Infer gene regulatory networks (GRN)

Run GRNBoost2 with multiple random seeds. Each seed produces an independent GRN. Later, these are merged into a consensus network.

For this step, user can run 'getGRN' based on 30 random seeds for robustness.

Gene mode (RBP → Gene):

for SEED in $(seq 1 30); do
  scRBP getGRN \
      --matrix    PBMC_sketch_15K.feather \
      --rbp_list  human_RBP_list.txt \
      --output    grn_seed${SEED} \
      --mode      gene \
      --method    grnboost2 \
      --n_workers 20 \
      --correlation True \
      --seed      ${SEED}
done
# Output: grn_seed1_scRBP_gene_GRNs.tsv, grn_seed2_scRBP_gene_GRNs.tsv, ...

Isoform mode (RBP → Isoform, requires isoform annotation):

scRBP getGRN \
    --matrix                     PBMC_isoform.feather \
    --rbp_list                   human_RBP_list.txt \
    --output                     iso_grn_seed1 \
    --mode                       isoform \
    --isoform_annotation         gencode_v44_isoform_gene_map.tsv \
    --rbp_agg_method             sum \
    --remove_self_targets        True \
    --min_target_cells_expressed 10 \
    --min_target_mean_expr       0.01 \
    --method                     grnboost2 \
    --n_workers                  20 \
    --seed                       1
# Output: iso_grn_seed1_scRBP_isoform_GRNs.tsv (+ 4 auxiliary files)

Step 3 — Merge GRN seeds into a consensus network

scRBP getMerge_GRN \
    --pattern "grn_seed*_scRBP_gene_GRNs.tsv" \
    --output  grn_consensus.tsv \
    --n_present 15 \
    --present_rate 0.5

Edges appearing in fewer than 50% of seeds are discarded, yielding a stable consensus network.

Step 4 — Extract regulon candidate modules

scRBP getModule \
    --input              grn_consensus.tsv \
    --output_merged      modules.tsv \
    --importance_threshold 0.005 \
    --top_n_list         "5,10,50" \
    --target_top_n       "50" \
    --percentile         "0.75,0.9"

Step 5 — Prune with motif-binding evidence

scRBP getPrune \
    --rbp_targets        modules.tsv \
    --motif_rbp_links    motif2rbp.csv \
    --motif_target_ranks rankings.feather \
    --save_dir           ./pruned/ \
    --rank_threshold     1500

Step 6 — Export regulons to GMT format

scRBP getRegulon \
    --input       pruned/ctx_scores.csv \
    --out-symbol  regulons_symbol.gmt \
    --out-entrez  regulons_entrez.gmt \
    --map-custom  NCBI38.gene.loc \
    --min_genes   5

Step 7 — Merge region-specific GMT files

scRBP mergeRegulons \
    --base_dir ./analysis/ \
    --input    regulons_symbol.gmt \
    --output   regulons_combined.gmt \
    --recursive

Step 8 — Compute Regulon Activity Scores (RAS)

Uses the AUCell algorithm to score each cell or cell type for regulon activity. Also computes the Jensen–Shannon divergence-based Regulon Specificity Score (RSS).

scRBP ras \
    --mode         ct \
    --matrix       PBMC_sketch_15K.feather \
    --regulons     regulons_symbol.gmt \
    --out          ras_output/ \
    --celltypes-csv cell_to_celltype.csv

Step 9 — Regulon genetic association score (RGS)

Links each regulon to GWAS traits using MAGMA gene-set analysis with a 4D null distribution for empirical p-values.

scRBP rgs \
    --mode      ct \
    --magma     ~/tools/magma/magma \
    --genes-raw gwas.genes.raw \
    --sets      regulons_entrez.gmt \
    --id-type   entrez \
    --out       rgs_output/rgs

Step 10 — Compute Trait relevance Score (TRS)

Integrates RAS and RGS into a unified score:

TRS = norm(RAS) + norm(RGS) − λ × |norm(RAS) − norm(RGS)|

RBPs with high TRS are both activity-high in the cell type and genetically linked to the trait.

scRBP trs \
    --mode       ct \
    --ras        ras_output/aucell_ct.csv \
    --rgs_csv    rgs_output/rgs_real.csv \
    --out_prefix trs_output/trs

Command Reference

Step Command Key Inputs Key Output
1 scRBP getSketch .h5ad / .feather Downsampled cells
2 scRBP getGRN Expression matrix, RBP list *_scRBP_gene_GRNs.tsv or *_scRBP_isoform_GRNs.tsv
3 scRBP getMerge_GRN Multiple GRN TSV files (glob) Consensus GRN TSV
4 scRBP getModule Consensus GRN TSV Modules TSV
5 scRBP getPrune Modules TSV, motif files Pruned scores (CSV)
6 scRBP getRegulon Pruned scores Regulons GMT (symbol + Entrez)
7 scRBP mergeRegulons Multiple GMT files Merged GMT
8 scRBP ras Expression matrix, GMT AUCell scores, RSS matrix
9 scRBP rgs MAGMA .genes.raw, GMT RGS scores CSV
10 scRBP trs RAS CSV, RGS CSV TRS scores CSV

Use scRBP <command> --help to see all parameters for any step.


Dependencies

Category Packages
Core numerics numpy, pandas, scipy, scikit-learn
Single-cell I/O anndata, scanpy, loompy
Fast I/O polars, pyarrow
Cell downsampling geosketch
GRN inference arboreto (GRNBoost2 / GENIE3)
Motif enrichment ctxcore, pyscenic
Progress display tqdm
GWAS enrichment MAGMA binary (external, user-provided)

Citation

If you use scRBP in your research, please cite:

Ma Y. et al. Decoding disease-associated RNA-binding protein-mediated regulatory networks through polygenic enrichment across diverse cellular contexts. (2026)


License

MIT License. See LICENSE for details.


Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrbp-0.1.1.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrbp-0.1.1-py3-none-any.whl (71.3 kB view details)

Uploaded Python 3

File details

Details for the file scrbp-0.1.1.tar.gz.

File metadata

  • Download URL: scrbp-0.1.1.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for scrbp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9e02031c831801f3af2f2e4d4fbde47a84ea9348bd87b4fd717e28ac37c6bb69
MD5 3188cc909489fe9b323727fe2d4f12dc
BLAKE2b-256 e180f5dd339e2b3fb07736e9e3f91a852d1f57cfef22ac1631f2ff3efa9a7d79

See more details on using hashes here.

File details

Details for the file scrbp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scrbp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 71.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for scrbp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e89cf8f2019cc20a771cbd9a438e90fa68e1d7b0aad795910a8a50b82ad0a321
MD5 f33a38ec856f7066ef789ce288ca2cc1
BLAKE2b-256 db6b61519d427e22ecf38d5247121cda77143f2e120fe6ef41cfea195d797f06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page