scOPE: single-cell Oncological Prediction Explorer — transfer-learning from bulk RNA-seq to predict per-cell mutation probabilities in scRNA-seq.

These details have not been verified by PyPI

Project links

Project description

scOPE — single-cell Oncological Prediction Explorer

scOPE workflow (transfer learning from bulk → single-cell)

scOPE is a transfer-learning framework that uses bulk RNA-seq cohorts with known mutation status to learn a compact, biologically meaningful latent space, then projects single-cell RNA-seq into that same space to predict the probability that specific cancer-associated gene mutations are present in individual cells. This provides mutation-informed subclonal structure that can complement CNV methods and increase subclonal granularity.

Overview

scOPE proceeds in two phases:

1) Learn latent factors from bulk RNA-seq and train mutation classifiers (Panels a–c)

i. Bulk expression matrix
Construct a bulk cohort expression matrix A_bulk (rows = patient samples, columns = genes).
The bulk matrix is normalized / centered / scaled to ensure comparable gene-wise signal.
ii. Latent feature mapping via SVD
Decompose the normalized bulk matrix using SVD:
- U_bulk: sample scores (rows = patients, columns = latent factors)
- Σ_bulk: diagonal matrix of singular values
- V: gene loadings (rows = genes, columns = latent factors)
Define the bulk latent representation (patient-by-factor embedding):
iii. Train mutation-prediction models in latent space
For each mutation / gene-of-interest, train a supervised ML model to predict mutation presence Y from Z_bulk.
This yields one (or multiple) mutation-specific classifiers operating on the learned latent factors.

2) Project scRNA-seq into the bulk-derived latent space and predict mutations per cell (Panels d–f)

i. Single-cell expression matrix
Construct a single-cell expression matrix A_sc (rows = single cells, columns = genes).
ii. Normalize scRNA-seq using bulk-derived parameters
Apply the same gene-wise normalization / centering / scaling learned from the bulk cohort to obtain A′_sc.
(This alignment step makes the projection comparable across bulk and single-cell.)
iii. Project cells into latent space and infer mutation probabilities
Use the bulk-derived gene loadings V to compute the single-cell latent representation:

Then apply the trained bulk models to Z_sc to predict per-cell mutation probabilities, producing mutation-informed cellular maps that can be analyzed alongside expression programs, clusters, and CNV signals.

Installation

From PyPI

pip install scope-bio

With optional dependencies (UMAP, XGBoost, LightGBM, SHAP)

pip install scope-bio[full]

From conda-forge

conda install -c conda-forge scope-bio

Development install

git clone https://github.com/Ashford-A/scOPE.git
cd scOPE
conda env create -f environments/scope-dev.yml
conda activate scope-dev
pip install -e ".[dev]"

Quick start

import anndata as ad
import pandas as pd
from scope import BulkPipeline, SingleCellPipeline
from scope.io import load_mutation_labels

# --- Phase 1: Bulk --------------------------------------------------------
adata_bulk = ad.read_h5ad("bulk_cohort.h5ad")
mutation_labels = load_mutation_labels("mutations.csv", sample_col="sample_id")

bulk_pipe = BulkPipeline(
    norm_method="cpm",
    decomposition="svd",   # "svd" | "nmf" | "ica" | "pca" | "fa" | "cnmf"
    n_components=50,
    classifier="logistic", # "logistic" | "random_forest" | "gbm" | "xgboost" | "lightgbm" | "svm" | "mlp"
)
bulk_pipe.fit(adata_bulk, mutation_labels, cv=5)
bulk_pipe.save("models/bulk_pipeline.pkl")

# --- Phase 2: Single cell --------------------------------------------------
adata_sc = ad.read_h5ad("sc_tumor.h5ad")

adata_bulk_pp = bulk_pipe.preprocessor_.transform(adata_bulk)

sc_pipe = SingleCellPipeline(
    bulk_pipeline=bulk_pipe,
    alignment_method="z_score_bulk",  # "z_score_bulk" | "moment_matching" | "quantile" | "none"
)
sc_pipe.fit(adata_bulk_pp, adata_sc)
adata_sc = sc_pipe.transform(adata_sc)

# adata_sc.obs now contains columns: mutation_prob_KRAS, mutation_prob_TP53, ...

# --- Visualise -------------------------------------------------------------
from scope.visualization import compute_umap, plot_mutation_probabilities

adata_sc = compute_umap(adata_sc, obsm_key="X_svd")
fig = plot_mutation_probabilities(adata_sc, mutations=["KRAS", "TP53"])
fig.savefig("mutation_probs.pdf", bbox_inches="tight")

Preprocessing options

BulkPreprocessor and SingleCellPreprocessor are designed to handle the full range of input states — from raw counts to already-normalized matrices — without requiring you to re-implement preprocessing outside the pipeline.

Bulk: handling already-processed inputs

bulk_pipe = BulkPipeline(
    norm_method="none",               # skip library-size normalization
    log1p=False,                      # data is already log-transformed
    decomposition="svd",
    n_components=50,
)

# Or pass flags directly to BulkPreprocessor:
from scope.preprocessing import BulkPreprocessor

preprocessor = BulkPreprocessor(
    norm_method="cpm",
    log1p=True,
    already_log_transformed=False,    # set True if input is e.g. log2-TPM from GEO
    min_samples_expressed=5,          # remove genes expressed in fewer than 5 samples
    min_expression=0.5,               # expression threshold for the above filter
    gene_blacklist=["MALAT1", "NEAT1"],
    auto_remove_mito=True,            # remove MT- genes
    auto_remove_ribo=True,            # remove RPS/RPL genes
    run_hvg=True,                     # select highly variable genes (requires scanpy)
    n_hvg=3000,
    hvg_flavor="seurat_v3",
    batch_key="cohort",               # batch correction via obs column
    batch_method="combat",            # "combat" | "harmony"
)

Single-cell: QC, mito filtering, and doublet removal

from scope.preprocessing import SingleCellPreprocessor

sc_prep = SingleCellPreprocessor(
    filter_strategy="both",           # "min_counts" | "min_genes" | "both" | "none"
    min_counts=500,
    min_genes=300,
    max_counts=25000,
    max_genes=6000,                   # upper bound (doublet proxy)
    max_mito_pct=20.0,                # remove cells with >20% mitochondrial reads
    auto_flag_mito=True,              # annotate pct_mito in adata.obs regardless
    run_doublet_detection=True,       # Scrublet-based doublet removal (pip install scrublet)
    doublet_threshold=None,           # None = automatic Scrublet threshold
    already_qc_filtered=False,        # True = skip all cell-level filters
    already_normalized=False,         # True = skip library-size normalization
    already_log_transformed=False,    # True = skip log1p
)

Decomposition methods

scOPE supports several latent-space methods, all sharing the same fit / transform / components_ interface and usable as a drop-in via the decomposition= argument in BulkPipeline.

Key	Class	Notes
`"svd"`	`SVDDecomposition`	Default. Linear, interpretable. Gene loadings V enable direct sc projection.
`"nmf"`	`NMFDecomposition`	Non-negative, additive gene programs (metagenes). Requires non-negative input.
`"ica"`	`ICADecomposition`	Independent components. Useful for finding non-Gaussian expression sources.
`"pca"`	`PCADecomposition`	Standard PCA. Equivalent to SVD on centred data.
`"fa"`	`FactorAnalysisDecomposition`	Probabilistic FA. Accounts for gene-specific noise variance (heteroscedasticity).
`"cnmf"`	`ConsensusNMFDecomposition`	Consensus NMF (Kotliar et al., eLife 2019). Runs NMF n times, clusters components for stability. Recommended over single-run NMF for publication.

# Consensus NMF example
bulk_pipe = BulkPipeline(
    decomposition="cnmf",
    decomposition_kwargs={"n_iter": 50, "n_components": 20},
    classifier="logistic",
)

# Factor Analysis example
bulk_pipe = BulkPipeline(
    decomposition="fa",
    n_components=30,
    classifier="logistic",
)

SVD evaluation

When using decomposition="svd", SVDEvaluator produces a comprehensive suite of plots and a gene program table that characterize which components drive classification and which genes define them. This is useful for reviewers, manuscript figures, and biological interpretation.

from scope.evaluation import SVDEvaluator

# After fitting:
adata_pp = bulk_pipe.transform_bulk(adata_bulk)
Z_bulk = adata_pp.obsm[bulk_pipe.obsm_key_]

ev = SVDEvaluator(bulk_pipe, Z_bulk, mutation="KRAS")
ev.run_all(output_dir="figures/svd_eval_KRAS")

run_all() saves the following to output_dir/:

Output	Description
`weighted_scree.png`	Scree plot with bars colour-coded by `\|coef\|×σ` classifier importance
`component_importance.png`	Ranked bar chart of importance; decomposed `\|coef\|` vs σ
`gene_loading_heatmap.png`	Hierarchically clustered heatmap of top gene loadings × top components
`top_genes_per_component.png`	Signed gene bar charts for each top-important component
`latent_scatter.png`	Pairwise scatter of top components coloured by mutation label
`separation_violins.png`	Component score distributions by mutation status + Mann-Whitney p
`component_label_correlation.png`	Spearman ρ heatmap between components and mutation label (FDR-corrected)
`roc_ablation.png`	Cross-validated ROC curve + AUC vs. n_components retained
`permutation_test.png`	Observed AUROC vs. permutation null distribution
`gene_biplot.png`	Sample scores + gene loading arrows for top-2 components
`shap_summary_dot.png`	SHAP dot summary (requires `shap`)
`shap_summary_bar.png`	SHAP bar summary (requires `shap`)
`umap_zbulk.png`	UMAP of Z_bulk coloured by mutation label (requires `umap-learn`)
`calibration_curve.png`	Reliability diagram for predicted mutation probabilities
`component_crosscorr.png`	Pearson correlation among SVD components (flags batch bleed)
`gene_program_table.csv`	Tidy table: component rank, σ, `\|coef\|`, importance, top genes + loadings

Individual plots can also be called directly:

ev.plot_weighted_scree(output_dir=Path("figures/"))
ev.plot_separation_violins(output_dir=Path("figures/"), top_components=12)
ev.export_gene_program_table(output_dir=Path("figures/"), top_components=10)

API reference

Preprocessing

Class	Description
`BulkNormalizer`	CPM / TPM / median-ratio / TMM normalisation
`BulkScaler`	Gene-wise centering and scaling
`BulkPreprocessor`	Combined normalise + scale, with gene filtering, HVG, and batch correction
`SingleCellPreprocessor`	QC filter + mito filter + doublet removal + normalise + optional scale
`BulkSCAligner`	z-score / moment-matching / quantile alignment

Decomposition

Class	Description
`SVDDecomposition`	Truncated SVD (randomized / ARPACK / full)
`NMFDecomposition`	Non-negative matrix factorization
`ICADecomposition`	FastICA
`PCADecomposition`	PCA via sklearn
`FactorAnalysisDecomposition`	Probabilistic factor analysis (heteroscedastic noise)
`ConsensusNMFDecomposition`	Consensus NMF for stable gene program discovery
`get_decomposition(name)`	Factory function

Classification

Class	Description
`LogisticMutationClassifier`	L1/L2/ElasticNet logistic regression
`RandomForestMutationClassifier`	Random forest
`GBMMutationClassifier`	Gradient boosting (sklearn)
`XGBMutationClassifier`	XGBoost
`LGBMMutationClassifier`	LightGBM
`SVMMutationClassifier`	SVM + Platt calibration
`MLPMutationClassifier`	Multi-layer perceptron
`PerMutationClassifierSet`	Trains/stores one classifier per mutation
`get_classifier(name)`	Factory function

Evaluation

Class / Function	Description
`SVDEvaluator`	Full SVD component interpretation suite (15 plots + gene program CSV)
`evaluate_classifier`	AUROC, AUPRC, Brier score
`evaluate_all`	Evaluate all mutations at once
`cross_validate_classifiers`	Stratified k-fold CV
`roc_curve_data` / `pr_curve_data`	Curve arrays for plotting

Visualization

Function	Description
`compute_umap`	UMAP on latent embedding
`compute_tsne`	t-SNE on latent embedding
`plot_embedding`	Scatter by categorical or continuous
`plot_mutation_probabilities`	Grid of per-mutation probability overlays
`plot_scree`	Singular value / EVR scree plot
`plot_mutation_heatmap`	Mean probability per cluster heatmap

Optional dependencies

Package	Purpose	Install
`umap-learn`	UMAP embeddings and SVDEvaluator UMAP plot	`pip install scope-bio[full]`
`shap`	SHAP component importance in SVDEvaluator	`pip install scope-bio[full]`
`xgboost`	XGBoost classifier	`pip install scope-bio[full]`
`lightgbm`	LightGBM classifier	`pip install scope-bio[full]`
`scrublet`	Doublet detection in SingleCellPreprocessor	`pip install scrublet`
`combat`	ComBat batch correction in BulkPreprocessor	`pip install combat`
`harmonypy`	Harmony batch correction in BulkPreprocessor	`pip install harmonypy`
`statsmodels`	FDR correction in SVDEvaluator correlation heatmap	`pip install statsmodels`

Running tests

pytest tests/ -v --cov=scope --cov-report=term-missing

Citation

If you use scOPE in your research, please cite:

Ashford, A. et al. (2024). scOPE: transfer-learning from bulk RNA-seq to infer per-cell mutation probabilities in single-cell transcriptomics. [Journal] doi: ...

(Work currently in progress - will be filled out later once preprint is up on bioRxiv)

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Mar 6, 2026

0.1.9

Mar 6, 2026

0.1.8

Mar 6, 2026

0.1.7

Mar 6, 2026

0.1.6

Mar 6, 2026

0.1.5

Mar 6, 2026

0.1.4

Mar 6, 2026

0.1.3

Mar 6, 2026

0.1.2

Mar 6, 2026

0.1.1

Mar 5, 2026

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scope_bio-0.2.0.tar.gz (71.8 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scope_bio-0.2.0-py3-none-any.whl (68.5 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file scope_bio-0.2.0.tar.gz.

File metadata

Download URL: scope_bio-0.2.0.tar.gz
Upload date: Mar 6, 2026
Size: 71.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for scope_bio-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`2682819fc52eaabf08e1fb9ad9a1b92d5eef2bc91454816481a18f5aa538d646`
MD5	`24bdc5e70da1467d153f73587e45d492`
BLAKE2b-256	`365143e4f1daf4e953705d315a07a4e98390ca00c43b522ad0192f26d9663d2b`

See more details on using hashes here.

File details

Details for the file scope_bio-0.2.0-py3-none-any.whl.

File metadata

Download URL: scope_bio-0.2.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 68.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for scope_bio-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cee94be76bb563229e5eab2bc8a67855b791bfd2b98af42d30e40b5d8b230f57`
MD5	`54061f2ce2dcb5ce1be6cbb4bfa9d6c7`
BLAKE2b-256	`226ce538dfcc6a252ffd9ba469c9f4e45118342c1a2424c34ff25d565877cc99`

See more details on using hashes here.

scope-bio 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scOPE — single-cell Oncological Prediction Explorer

scOPE workflow (transfer learning from bulk → single-cell)

Overview

1) Learn latent factors from bulk RNA-seq and train mutation classifiers (Panels a–c)

2) Project scRNA-seq into the bulk-derived latent space and predict mutations per cell (Panels d–f)

Installation

From PyPI

With optional dependencies (UMAP, XGBoost, LightGBM, SHAP)

From conda-forge

Development install

Quick start

Preprocessing options

Bulk: handling already-processed inputs

Single-cell: QC, mito filtering, and doublet removal

Decomposition methods

SVD evaluation

API reference

Preprocessing

Decomposition

Classification

Evaluation

Visualization

Optional dependencies

Running tests

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes