Skip to main content

scOPE: single-cell Oncological Prediction Explorer — transfer-learning from bulk RNA-seq to predict per-cell mutation probabilities in scRNA-seq.

Project description

scOPE — single-cell Oncological Prediction Explorer

PyPI version License: MIT Python 3.9+

scOPE transfer-learning method overview

scOPE workflow (transfer learning from bulk → single-cell)

**scOPE is a transfer-learning framework that uses bulk RNA-seq cohorts with known mutation status to learn a compact, biologically meaningful latent space, then projects single-cell RNA-seq into that same space to predict the probability that specific cancer-associated gene mutations are present in individual cells. This provides mutation-informed subclonal structure that can complement CNV methods and increase subclonal granularity.

Overview

scOPE proceeds in two phases:

1) Learn latent factors from bulk RNA-seq and train mutation classifiers (Panels a–c)

  • i. Bulk expression matrix
    Construct a bulk cohort expression matrix A_bulk (rows = patient samples, columns = genes).
    The bulk matrix is normalized / centered / scaled to ensure comparable gene-wise signal.

  • ii. Latent feature mapping via SVD
    Decompose the normalized bulk matrix using SVD:

    A_bulk = U_bulk Σ_bulk V^T

    • U_bulk: sample scores (rows = patients, columns = latent factors)
    • Σ_bulk: diagonal matrix of singular values
    • V: gene loadings (rows = genes, columns = latent factors)

    Define the bulk latent representation (patient-by-factor embedding):

    Z_bulk = U_bulk Σ_bulk

  • iii. Train mutation-prediction models in latent space
    For each mutation / gene-of-interest, train a supervised ML model to predict mutation presence Y from Z_bulk.
    This yields one (or multiple) mutation-specific classifiers operating on the learned latent factors.


2) Project scRNA-seq into the bulk-derived latent space and predict mutations per cell (Panels d–f)

  • i. Single-cell expression matrix
    Construct a single-cell expression matrix A_sc (rows = single cells, columns = genes).

  • ii. Normalize scRNA-seq using bulk-derived parameters
    Apply the same gene-wise normalization / centering / scaling learned from the bulk cohort to obtain A′_sc.
    (This alignment step makes the projection comparable across bulk and single-cell.)

  • iii. Project cells into latent space and infer mutation probabilities
    Use the bulk-derived gene loadings V to compute the single-cell latent representation:

    Z_sc = A'_sc V

    Then apply the trained bulk models to Z_sc to predict per-cell mutation probabilities, producing mutation-informed cellular maps that can be analyzed alongside expression programs, clusters, and CNV signals.


Installation

From PyPI

pip install scope-bio

With optional dependencies (UMAP, XGBoost, LightGBM, SHAP)

pip install scope-bio[full]

From conda-forge

conda install -c conda-forge scope-bio

Development install

git clone https://github.com/Ashford-A/scOPE.git
cd scOPE
conda env create -f environments/scope-dev.yml
conda activate scope-dev
pip install -e ".[dev]"

Quick start

import anndata as ad
import pandas as pd
from scope import BulkPipeline, SingleCellPipeline
from scope.io import load_mutation_labels

# --- Phase 1: Bulk --------------------------------------------------------
adata_bulk = ad.read_h5ad("bulk_cohort.h5ad")
mutation_labels = load_mutation_labels("mutations.csv", sample_col="sample_id")

bulk_pipe = BulkPipeline(
    norm_method="cpm",
    decomposition="svd",   # "svd" | "nmf" | "ica" | "pca"
    n_components=50,
    classifier="logistic", # "logistic" | "random_forest" | "gbm" | "xgboost" | "lightgbm" | "svm" | "mlp"
)
bulk_pipe.fit(adata_bulk, mutation_labels, cv=5)
bulk_pipe.save("models/bulk_pipeline.pkl")

# --- Phase 2: Single cell --------------------------------------------------
adata_sc = ad.read_h5ad("sc_tumor.h5ad")

# Prepare a preprocessed bulk reference for moment matching
adata_bulk_pp = bulk_pipe.preprocessor_.transform(adata_bulk)

sc_pipe = SingleCellPipeline(
    bulk_pipeline=bulk_pipe,
    alignment_method="z_score_bulk",  # "z_score_bulk" | "moment_matching" | "quantile" | "none"
)
sc_pipe.fit(adata_bulk_pp, adata_sc)
adata_sc = sc_pipe.transform(adata_sc)

# adata_sc.obs now contains columns: mutation_prob_KRAS, mutation_prob_TP53, ...

# --- Visualise -------------------------------------------------------------
from scope.visualization import compute_umap, plot_mutation_probabilities

adata_sc = compute_umap(adata_sc, obsm_key="X_svd")
fig = plot_mutation_probabilities(adata_sc, mutations=["KRAS", "TP53"])
fig.savefig("mutation_probs.pdf", bbox_inches="tight")

API reference

Preprocessing

Class Description
BulkNormalizer CPM / TPM / median-ratio / TMM normalisation
BulkScaler Gene-wise centering and scaling
BulkPreprocessor Combined normalise + scale
SingleCellPreprocessor QC filter + normalise + optional scale
BulkSCAligner z-score / moment-matching / quantile alignment

Decomposition

Class Description
SVDDecomposition Truncated SVD (randomized / ARPACK / full)
NMFDecomposition Non-negative matrix factorization
ICADecomposition FastICA
PCADecomposition PCA via sklearn
get_decomposition(name) Factory function

Classification

Class Description
LogisticMutationClassifier L1/L2/ElasticNet logistic regression
RandomForestMutationClassifier Random forest
GBMMutationClassifier Gradient boosting (sklearn)
XGBMutationClassifier XGBoost
LGBMMutationClassifier LightGBM
SVMMutationClassifier SVM + Platt calibration
MLPMutationClassifier Multi-layer perceptron
PerMutationClassifierSet Trains/stores one classifier per mutation
get_classifier(name) Factory function

Evaluation

Function Description
evaluate_classifier AUROC, AUPRC, Brier score
evaluate_all Evaluate all mutations at once
cross_validate_classifiers Stratified k-fold CV
roc_curve_data / pr_curve_data Curve arrays for plotting

Visualization

Function Description
compute_umap UMAP on latent embedding
compute_tsne t-SNE on latent embedding
plot_embedding Scatter by categorical or continuous
plot_mutation_probabilities Grid of per-mutation probability overlays
plot_scree Singular value / EVR scree plot
plot_mutation_heatmap Mean probability per cluster heatmap

Running tests

pytest tests/ -v --cov=scope --cov-report=term-missing

Citation

If you use scOPE in your research, please cite:

Ashford, A. et al. (2024). scOPE: transfer-learning from bulk RNA-seq to infer per-cell mutation probabilities in single-cell transcriptomics. [Journal] doi: ...


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scope_bio-0.1.0.tar.gz (22.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scope_bio-0.1.0-py3-none-any.whl (48.7 kB view details)

Uploaded Python 3

File details

Details for the file scope_bio-0.1.0.tar.gz.

File metadata

  • Download URL: scope_bio-0.1.0.tar.gz
  • Upload date:
  • Size: 22.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for scope_bio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 59df32371815229fc70af66152a16ab497f55a469da2a9296b2ca1e13b3a2451
MD5 c4eadca477d804e18e916bc2ae090e50
BLAKE2b-256 0368c9c004a33649d7d6fa76fd6bd51abdb06c51e2b688a1ff62dae34adc2c67

See more details on using hashes here.

File details

Details for the file scope_bio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scope_bio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for scope_bio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 beceabbe5c7ce84282ec35629c7b652775292420c2ee82f6dbf6f94560c0e56c
MD5 93aa8c825e6ffba6ea2e7635fdb9ab54
BLAKE2b-256 5c1cd9fad62e2646008afbfb62ede9de9f0b2a561a13636b24e0922a98d1b250

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page