scOPE: single-cell Oncological Prediction Explorer — transfer-learning from bulk RNA-seq to predict per-cell mutation probabilities in scRNA-seq.
Project description
scOPE — single-cell Oncological Prediction Explorer
scOPE workflow (transfer learning from bulk → single-cell)
**scOPE is a transfer-learning framework that uses bulk RNA-seq cohorts with known mutation status to learn a compact, biologically meaningful latent space, then projects single-cell RNA-seq into that same space to predict the probability that specific cancer-associated gene mutations are present in individual cells. This provides mutation-informed subclonal structure that can complement CNV methods and increase subclonal granularity.
Overview
scOPE proceeds in two phases:
1) Learn latent factors from bulk RNA-seq and train mutation classifiers (Panels a–c)
-
i. Bulk expression matrix
Construct a bulk cohort expression matrix A_bulk (rows = patient samples, columns = genes).
The bulk matrix is normalized / centered / scaled to ensure comparable gene-wise signal. -
ii. Latent feature mapping via SVD
Decompose the normalized bulk matrix using SVD:- U_bulk: sample scores (rows = patients, columns = latent factors)
- Σ_bulk: diagonal matrix of singular values
- V: gene loadings (rows = genes, columns = latent factors)
Define the bulk latent representation (patient-by-factor embedding):
-
iii. Train mutation-prediction models in latent space
For each mutation / gene-of-interest, train a supervised ML model to predict mutation presence Y from Z_bulk.
This yields one (or multiple) mutation-specific classifiers operating on the learned latent factors.
2) Project scRNA-seq into the bulk-derived latent space and predict mutations per cell (Panels d–f)
-
i. Single-cell expression matrix
Construct a single-cell expression matrix A_sc (rows = single cells, columns = genes). -
ii. Normalize scRNA-seq using bulk-derived parameters
Apply the same gene-wise normalization / centering / scaling learned from the bulk cohort to obtain A′_sc.
(This alignment step makes the projection comparable across bulk and single-cell.) -
iii. Project cells into latent space and infer mutation probabilities
Use the bulk-derived gene loadings V to compute the single-cell latent representation:Then apply the trained bulk models to Z_sc to predict per-cell mutation probabilities, producing mutation-informed cellular maps that can be analyzed alongside expression programs, clusters, and CNV signals.
Installation
From PyPI
pip install scope-bio
With optional dependencies (UMAP, XGBoost, LightGBM, SHAP)
pip install scope-bio[full]
From conda-forge
conda install -c conda-forge scope-bio
Development install
git clone https://github.com/Ashford-A/scOPE.git
cd scOPE
conda env create -f environments/scope-dev.yml
conda activate scope-dev
pip install -e ".[dev]"
Quick start
import anndata as ad
import pandas as pd
from scope import BulkPipeline, SingleCellPipeline
from scope.io import load_mutation_labels
# --- Phase 1: Bulk --------------------------------------------------------
adata_bulk = ad.read_h5ad("bulk_cohort.h5ad")
mutation_labels = load_mutation_labels("mutations.csv", sample_col="sample_id")
bulk_pipe = BulkPipeline(
norm_method="cpm",
decomposition="svd", # "svd" | "nmf" | "ica" | "pca"
n_components=50,
classifier="logistic", # "logistic" | "random_forest" | "gbm" | "xgboost" | "lightgbm" | "svm" | "mlp"
)
bulk_pipe.fit(adata_bulk, mutation_labels, cv=5)
bulk_pipe.save("models/bulk_pipeline.pkl")
# --- Phase 2: Single cell --------------------------------------------------
adata_sc = ad.read_h5ad("sc_tumor.h5ad")
# Prepare a preprocessed bulk reference for moment matching
adata_bulk_pp = bulk_pipe.preprocessor_.transform(adata_bulk)
sc_pipe = SingleCellPipeline(
bulk_pipeline=bulk_pipe,
alignment_method="z_score_bulk", # "z_score_bulk" | "moment_matching" | "quantile" | "none"
)
sc_pipe.fit(adata_bulk_pp, adata_sc)
adata_sc = sc_pipe.transform(adata_sc)
# adata_sc.obs now contains columns: mutation_prob_KRAS, mutation_prob_TP53, ...
# --- Visualise -------------------------------------------------------------
from scope.visualization import compute_umap, plot_mutation_probabilities
adata_sc = compute_umap(adata_sc, obsm_key="X_svd")
fig = plot_mutation_probabilities(adata_sc, mutations=["KRAS", "TP53"])
fig.savefig("mutation_probs.pdf", bbox_inches="tight")
API reference
Preprocessing
| Class | Description |
|---|---|
BulkNormalizer |
CPM / TPM / median-ratio / TMM normalisation |
BulkScaler |
Gene-wise centering and scaling |
BulkPreprocessor |
Combined normalise + scale |
SingleCellPreprocessor |
QC filter + normalise + optional scale |
BulkSCAligner |
z-score / moment-matching / quantile alignment |
Decomposition
| Class | Description |
|---|---|
SVDDecomposition |
Truncated SVD (randomized / ARPACK / full) |
NMFDecomposition |
Non-negative matrix factorization |
ICADecomposition |
FastICA |
PCADecomposition |
PCA via sklearn |
get_decomposition(name) |
Factory function |
Classification
| Class | Description |
|---|---|
LogisticMutationClassifier |
L1/L2/ElasticNet logistic regression |
RandomForestMutationClassifier |
Random forest |
GBMMutationClassifier |
Gradient boosting (sklearn) |
XGBMutationClassifier |
XGBoost |
LGBMMutationClassifier |
LightGBM |
SVMMutationClassifier |
SVM + Platt calibration |
MLPMutationClassifier |
Multi-layer perceptron |
PerMutationClassifierSet |
Trains/stores one classifier per mutation |
get_classifier(name) |
Factory function |
Evaluation
| Function | Description |
|---|---|
evaluate_classifier |
AUROC, AUPRC, Brier score |
evaluate_all |
Evaluate all mutations at once |
cross_validate_classifiers |
Stratified k-fold CV |
roc_curve_data / pr_curve_data |
Curve arrays for plotting |
Visualization
| Function | Description |
|---|---|
compute_umap |
UMAP on latent embedding |
compute_tsne |
t-SNE on latent embedding |
plot_embedding |
Scatter by categorical or continuous |
plot_mutation_probabilities |
Grid of per-mutation probability overlays |
plot_scree |
Singular value / EVR scree plot |
plot_mutation_heatmap |
Mean probability per cluster heatmap |
Running tests
pytest tests/ -v --cov=scope --cov-report=term-missing
Citation
If you use scOPE in your research, please cite:
Ashford, A. et al. (2024). scOPE: transfer-learning from bulk RNA-seq to infer per-cell mutation probabilities in single-cell transcriptomics. [Journal] doi: ...
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scope_bio-0.1.0.tar.gz.
File metadata
- Download URL: scope_bio-0.1.0.tar.gz
- Upload date:
- Size: 22.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59df32371815229fc70af66152a16ab497f55a469da2a9296b2ca1e13b3a2451
|
|
| MD5 |
c4eadca477d804e18e916bc2ae090e50
|
|
| BLAKE2b-256 |
0368c9c004a33649d7d6fa76fd6bd51abdb06c51e2b688a1ff62dae34adc2c67
|
File details
Details for the file scope_bio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scope_bio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
beceabbe5c7ce84282ec35629c7b652775292420c2ee82f6dbf6f94560c0e56c
|
|
| MD5 |
93aa8c825e6ffba6ea2e7635fdb9ab54
|
|
| BLAKE2b-256 |
5c1cd9fad62e2646008afbfb62ede9de9f0b2a561a13636b24e0922a98d1b250
|