Skip to main content

Fast Linear Algebra for Scalable Hybrid Deconvolution of Spatial Transcriptomics

Project description

FlashDeconv

PyPI version License Python 3.9+ DOI

Spatial deconvolution with linear scalability for atlas-scale data.

FlashDeconv estimates cell type proportions from spatial transcriptomics data (Visium, Visium HD, Stereo-seq). It is designed for large-scale analyses where computational efficiency is essential, while maintaining attention to low-abundance cell populations through leverage-score-based feature weighting.

Paper: Yang, C., Chen, J. & Zhang, X. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108


Installation

pip install flashdeconv

For development or additional I/O support, see Installation Options.


Quick Start

import scanpy as sc
import flashdeconv as fd

# Load data
adata_st = sc.read_h5ad("spatial.h5ad")
adata_ref = sc.read_h5ad("reference.h5ad")

# Deconvolve
fd.tl.deconvolve(adata_st, adata_ref, cell_type_key="cell_type")

# Results stored in adata_st.obsm["flashdeconv"]
sc.pl.spatial(adata_st, color="flashdeconv_Hepatocyte")

Overview

Spatial deconvolution methods offer different trade-offs. Probabilistic approaches like Cell2Location and RCTD provide rigorous uncertainty quantification; methods like CARD incorporate spatial structure through dense kernel matrices. FlashDeconv takes a complementary approach, prioritizing computational efficiency for million-scale datasets.

Design Principles

  1. Linear complexity — O(N) time and memory through randomized sketching and sparse graph regularization.

  2. Leverage-based feature weighting — Variance-based selection (PCA, HVG) can underweight markers of low-abundance populations. We use leverage scores from the reference SVD to identify genes that define distinct transcriptomic directions, regardless of expression magnitude.

  3. Sparse spatial regularization — Graph Laplacian smoothing with O(N) complexity, avoiding the O(N²) cost of dense kernel methods.


Performance

Scalability

Spots Time Memory
10,000 < 1 sec < 1 GB
100,000 ~4 sec ~2 GB
1,000,000 ~3 min ~21 GB

Benchmarked on MacBook Pro M2 Max (32GB unified memory), CPU-only.

Accuracy

On the Spotless benchmark:

Metric FlashDeconv RCTD Cell2Location
Pearson (56 datasets) 0.944 0.905 0.895

Performance varies by tissue type and experimental conditions. We recommend evaluating on data similar to your use case.


Algorithm

FlashDeconv solves a graph-regularized non-negative least squares problem:

minimize  ½‖Y - βX‖²_F + λ·Tr(βᵀLβ) + ρ‖β‖₁,  subject to β ≥ 0

where Y is spatial expression, X is reference signatures, L is the graph Laplacian, and β represents cell type abundances.

FlashDeconv Framework

Pipeline:

  1. Select informative genes (HVG ∪ markers) and compute leverage scores
  2. Compress gene space via weighted CountSketch (G → 512 dimensions)
  3. Construct sparse k-NN spatial graph
  4. Solve via block coordinate descent with spatial smoothing

API

Scanpy-style

fd.tl.deconvolve(
    adata_st,                    # Spatial AnnData
    adata_ref,                   # Reference AnnData
    cell_type_key="cell_type",   # Column in adata_ref.obs
    key_added="flashdeconv",     # Key for results
)

NumPy

from flashdeconv import FlashDeconv

model = FlashDeconv(
    sketch_dim=512,
    lambda_spatial="auto",
    n_hvg=2000,
    k_neighbors=6,
    random_state=0,
)
proportions = model.fit_transform(Y, X, coords)

Parameters

Parameter Default Description
sketch_dim 512 Sketch dimension
lambda_spatial "auto" Spatial regularization (auto-tuned)
n_hvg 2000 Highly variable genes
k_neighbors 6 Spatial graph neighbors
preprocess "log_cpm" Normalization: "log_cpm", "pearson", or "raw"
random_state 0 Random seed for reproducibility

Output

Attribute Description
proportions_ Cell type proportions (N × K), sum to 1
beta_ Raw abundances (N × K)
info_ Convergence statistics

Input Formats

  • Spatial data: AnnData, NumPy array (N × G), or SciPy sparse matrix
  • Reference: AnnData (aggregated by cell type) or NumPy array (K × G)
  • Coordinates: Extracted from adata.obsm["spatial"] or NumPy array (N × 2)

Reference Quality

Deconvolution accuracy depends on reference quality:

Requirement Guideline
Cells per type ≥ 500 recommended
Marker fold-change ≥ 5× for distinguishability
Signature correlation < 0.95 between types
No Unknown cells Filter before deconvolution

Critical: Always remove cells labeled "Unknown", "Unassigned", or similar. These cells act as universal signatures that absorb proportions from specific types—a fundamental property of regression-based deconvolution, not a FlashDeconv limitation.

See Reference Data Guide for details.


Installation Options

# Standard
pip install flashdeconv

# With AnnData support
pip install flashdeconv[io]

# Development
git clone https://github.com/cafferychen777/flashdeconv.git
cd flashdeconv && pip install -e ".[dev]"

Requirements: Python ≥ 3.9, numpy, scipy, numba. Optional: scanpy, anndata.


Citation

@article{yang2025flashdeconv,
  title={FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution
         via structure-preserving sketching},
  author={Yang, Chen and Chen, Jun and Zhang, Xianyang},
  journal={bioRxiv},
  year={2025},
  doi={10.64898/2025.12.22.696108}
}

Resources


Acknowledgments

We thank the developers of Spotless, Cell2Location, RCTD, CARD, and other deconvolution methods whose work contributed to this field.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flashdeconv-0.1.5.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flashdeconv-0.1.5-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file flashdeconv-0.1.5.tar.gz.

File metadata

  • Download URL: flashdeconv-0.1.5.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for flashdeconv-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f8b66d46c33ac615f46a99d6506765eb845dd38e88364d6d5a7d7b8ae9ad96a6
MD5 6626d9308c3e200dad8ba4af0d554b71
BLAKE2b-256 dcd60b0d946ea605cbbf32c68cc2fb49c2222c05e6b7bd25549b27926e6b024a

See more details on using hashes here.

File details

Details for the file flashdeconv-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: flashdeconv-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for flashdeconv-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1925d912e56fa38c8fd8687ce5decf7af0b3fff27072f3814417625d2b1424a0
MD5 f391e982c182b385486ceb89e62369a3
BLAKE2b-256 e3968e848f66715a61b2f6cedb83851861521c554ccb5bca863d860e9b4f8ec0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page