Fast Linear Algebra for Scalable Hybrid Deconvolution of Spatial Transcriptomics

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cafferychen777

These details have not been verified by PyPI

Project description

FlashDeconv

Fast Linear Algebra for Scalable Hybrid Deconvolution

Unlocking atlas-scale spatial biology with randomized numerical linear algebra.

FlashDeconv is a high-performance spatial transcriptomics deconvolution method designed for atlas-scale and subcellular-resolution platforms (Visium HD, Stereo-seq, Xenium). It leverages structure-preserving randomized sketching to estimate cell type proportions with linear scalability—processing 1 million spots in ~3 minutes on commodity hardware.

Paper: Chen Yang*, Xianyang Zhang*, Jun Chen*. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. Preprint, 2025.

Reproducibility: To reproduce figures and benchmarks from the paper, visit the flashdeconv-reproducibility repository.

Key Features

Ultra-fast & Scalable: Deconvolve 1 million spots in ~3 minutes. Time and memory scale linearly O(N) with dataset size.
Hardware Friendly: No GPU required. Runs efficiently on laptops (e.g., 32GB RAM handles 1M spots).
Rare Cell Detection: Uses leverage-score sampling to preserve transcriptomically distinct but low-abundance cell types (e.g., Tuft cells, endothelial cells) that variance-based methods systematically miss.
Spatially Aware: Sparse graph Laplacian regularization ensures spatial coherence without the O(N²) cost of dense kernel methods.
Visium HD Ready: Specifically optimized for the extreme sparsity and scale of subcellular resolution technologies (2µm–16µm bin sizes).
Statistically Rigorous: Log-CPM normalization with leverage-weighted gene selection preserves both common and rare cell populations.

Installation

# From source (recommended for latest features)
git clone https://github.com/cafferychen777/flashdeconv.git
cd flashdeconv
pip install -e .

# With development dependencies
pip install -e ".[dev]"

# With scanpy/anndata integration
pip install -e ".[io]"

Requirements: Python ≥ 3.8, numpy, scipy, numba. Optional: scanpy, anndata for AnnData workflow.

Quick Start

1. The NumPy Way

import numpy as np
from flashdeconv import FlashDeconv

# Load your data (example shapes)
# Y: Spatial count matrix (n_spots × n_genes), can be sparse
# X: Single-cell reference signatures (n_cell_types × n_genes)
# coords: Spatial coordinates (n_spots × 2)

# Initialize and run
model = FlashDeconv(
    sketch_dim=512,       # Sketch dimension (default: 512)
    lambda_spatial=5000,  # Spatial regularization strength
    verbose=True
)
proportions = model.fit_transform(Y, X, coords)

# Returns: (n_spots × n_cell_types) normalized proportions
print(proportions[:5])

2. The Scanpy/AnnData Way

FlashDeconv integrates seamlessly with the scanpy ecosystem.

import scanpy as sc
from flashdeconv import FlashDeconv
from flashdeconv.io import prepare_data, result_to_anndata

# Load data
adata_st = sc.read_h5ad("visium_hd_slide.h5ad")
adata_ref = sc.read_h5ad("scrna_reference.h5ad")

# 1. Prepare inputs (auto-aligns genes)
Y, X, coords, cell_type_names, gene_names = prepare_data(
    adata_st,
    adata_ref,
    cell_type_key="cell_type"  # Column name in adata_ref.obs
)

# 2. Run deconvolution
model = FlashDeconv(
    lambda_spatial=5000,  # Adjust based on platform (see Best Practices)
    verbose=True
)
proportions = model.fit_transform(Y, X, coords, cell_type_names=cell_type_names)

# 3. Save results back to AnnData
adata_st = result_to_anndata(proportions, adata_st, cell_type_names)

# Access results
adata_st.obsm["flashdeconv"]         # DataFrame with proportions
adata_st.obs["flashdeconv_dominant"] # Dominant cell type per spot

# Visualization
sc.pl.spatial(adata_st, color=["Hepatocyte", "Kupffer_cell"], img_key="hires")

Best Practices: Tuning `lambda_spatial`

While FlashDeconv works well with defaults, adjusting lambda_spatial (spatial regularization strength) based on your platform's spot size and counts-per-spot significantly improves results.

Platform	Spot Size	Typical UMI/Spot	Recommended `lambda_spatial`	Rationale
Standard Visium	55µm	10,000–30,000	`1000–10000` (default: 5000)	Strong signal; minimal smoothing needed
Visium HD (16µm)	16µm	500–2,000	`5000–20000`	Moderate sparsity; leverage neighbors
Visium HD (8µm)	8µm	100–500	`10000–50000`	Very sparse; rely on spatial priors
Visium HD (2µm)	2µm	10–100	`50000–100000`	Extreme sparsity; heavy smoothing
Stereo-seq / Seq-Scope	0.5–1µm	5–50	`50000–200000`	Single-cell/subcellular resolution; extreme sparsity

Note:

If cell type maps look "salt-and-pepper" noisy, increase lambda_spatial

If maps look overly blurred, decrease lambda_spatial

Use lambda_spatial="auto" for automatic tuning (may underestimate for real data; best for initial exploration)

For non-grid layouts (e.g., Xenium, MERFISH), set spatial_method="knn" (default)

Algorithm Under the Hood

FlashDeconv reformulates spatial deconvolution as Graph-Regularized Non-Negative Least Squares, solved in a compressed "sketch" space via randomized numerical linear algebra (RandNLA):

Figure 1. Overview of the FlashDeconv framework. (A) Input data preprocessing with Log-CPM normalization and gene selection. (B) Structure-preserving randomized sketching using leverage-score weighting to compress gene space while preserving rare cell signals. (C) Spatial graph construction and regularized optimization via Block Coordinate Descent. (D) Final cell type proportion estimates for each spatial location.

Three-Stage Framework

Preprocessing & Gene Selection
- Log-CPM normalization: Stabilizes variance and prevents high-expression genes from dominating the sketch
- Leverage-weighted gene selection: Combines highly variable genes (HVGs) with cell-type-specific markers, weighted by statistical leverage scores. Unlike variance (which conflates abundance with informativeness), leverage scores identify genes that define transcriptomically distinct directions, preserving rare cell type markers.
Structure-Preserving Sketching
- Randomized projection: Compress gene space (~20,000 genes → 512 dimensions) using CountSketch with leverage-score importance sampling
- Johnson-Lindenstrauss guarantee: Preserves Euclidean distances between cell type signatures with high probability
- Key innovation: Leverage-weighted sampling amplifies rare cell type markers relative to housekeeping genes, preventing signal loss during hash collisions
Spatial Graph Regularization
- Sparse graph Laplacian: Constructs k-NN spatial graph (O(N) memory vs. O(N²) for dense kernels like CARD)
- Numba-accelerated Block Coordinate Descent (BCD): Fast closed-form updates with non-negativity constraints
- Linear scalability: Spatial term complexity O(N·k) enables million-spot analysis

Why This Works

Log-CPM bounds dynamic range while preserving sparsity (log1p(0) = 0)
Leverage scores decouple biological identity from population abundance—markers of rare cell types (0.1% frequency) receive equal weight to abundant types (30% frequency)
Sparse graph Laplacian encodes spatial autocorrelation as a Gaussian Markov Random Field (GMRF) without dense matrix operations

Benchmarks

FlashDeconv exhibits linear O(N) scaling for both time and memory:

Dataset Size	Runtime	Memory	Hardware
10K spots	< 1 sec	< 1 GB	MacBook Pro M2 Max
100K spots	~4 sec	~2 GB	(32GB unified memory)
1M spots	~3 min	~21 GB	No GPU required

Accuracy on Synthetic Benchmarks (Spotless suite):

Pearson correlation: 0.944 (mean across 56 datasets spanning 6 tissues)
RMSE: 0.065 (median)
Rare cell detection (AUPR): 0.960 ± 0.036 (standard deviation)

Real-world validation:

Mouse liver (Visium): JSD = 0.056, ranking 3rd among 13 methods
Melanoma tumor (Visium): JSD = 0.027, ranking 5th among 13 methods
Reference stability: Ranked 1st for robustness to different scRNA-seq protocols

FlashDeconv matches top-tier Bayesian methods (Cell2Location, RCTD) on accuracy while accelerating inference by orders of magnitude.

API Reference

FlashDeconv Class

class FlashDeconv:
    def __init__(
        self,
        sketch_dim=512,              # Sketch space dimension
        lambda_spatial=5000.0,       # Spatial regularization (or "auto")
        rho_sparsity=0.01,           # L1 sparsity penalty
        n_hvg=2000,                  # Number of highly variable genes
        n_markers_per_type=50,       # Marker genes per cell type
        spatial_method="knn",        # "knn", "radius", or "grid"
        k_neighbors=6,               # k for k-NN graph
        max_iter=100,                # BCD max iterations
        tol=1e-4,                    # Convergence tolerance
        preprocess="log_cpm",        # "log_cpm", "pearson", or "raw"
        random_state=None,
        verbose=False,
    ): ...

    def fit(self, Y, X, coords, gene_names=None, cell_type_names=None) -> self
    def fit_transform(self, Y, X, coords, **kwargs) -> np.ndarray
    def get_cell_type_proportions(self) -> np.ndarray
    def get_abundances(self) -> np.ndarray
    def get_dominant_cell_type(self) -> np.ndarray
    def summary(self) -> dict

Parameters

Parameter	Type	Default	Description
`sketch_dim`	int	512	Dimension of sketch space (higher = more info, slower)
`lambda_spatial`	float or "auto"	5000.0	Spatial regularization strength (see Best Practices)
`rho_sparsity`	float	0.01	L1 sparsity penalty
`n_hvg`	int	2000	Number of highly variable genes to select
`n_markers_per_type`	int	50	Top markers per cell type
`k_neighbors`	int	6	Neighbors for spatial graph
`max_iter`	int	100	Maximum BCD iterations
`tol`	float	1e-4	Convergence tolerance
`preprocess`	str	"log_cpm"	Preprocessing: "log_cpm" (recommended), "pearson", or "raw"

Attributes (After Fitting)

Attribute	Shape	Description
`beta_`	(n_spots, n_cell_types)	Raw cell type abundances
`proportions_`	(n_spots, n_cell_types)	Normalized proportions (sum to 1)
`gene_idx_`	(n_selected,)	Indices of genes used
`lambda_used_`	float	Actual λ value used
`info_`	dict	Optimization info (converged, n_iterations, final_objective)
`cell_type_names_`	array	Cell type names (if provided)

Input Data Formats

FlashDeconv accepts multiple input formats:

Spatial Data (Y)

NumPy array: Dense (n_spots, n_genes)
SciPy sparse matrix: CSR/CSC format (recommended for Visium HD to reduce memory usage)
AnnData: .X or specified layer (e.g., adata.layers["counts"])

Reference (X)

NumPy array: Dense (n_cell_types, n_genes) signature matrix
AnnData: Automatically aggregated from single-cell data via prepare_data() using mean expression per cell type

Coordinates

NumPy array: (n_spots, 2) for 2D spatial coordinates, or (n_spots, 3) for 3D (e.g., z-stacked sections)
From AnnData: Automatically extracted from .obsm["spatial"], .obsm["X_spatial"], or .obs[["x", "y"]]

Citation

If you use FlashDeconv in your research, please cite:

Plain text:

Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. Preprint, 2025.

BibTeX:

@article{flashdeconv2025,
  title={FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching},
  author={Yang, Chen and Zhang, Xianyang and Chen, Jun},
  note={Preprint},
  year={2025}
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the GPL-3.0 License.

Related Resources

Paper Reproducibility: flashdeconv-reproducibility — Complete code to reproduce all figures and benchmarks
Documentation: ReadTheDocs (coming soon)
Issues & Support: GitHub Issues

Acknowledgments

We thank the developers of Spotless, Cell2Location, and RCTD for their benchmarking frameworks and methodological contributions to the spatial transcriptomics field.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cafferychen777

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

Feb 10, 2026

0.1.5

Feb 10, 2026

0.1.4

Jan 15, 2026

0.1.3

Dec 31, 2025

0.1.2

Dec 26, 2025

This version

0.1.1

Dec 13, 2025

0.1.0

Dec 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flashdeconv-0.1.1.tar.gz (52.7 kB view details)

Uploaded Dec 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flashdeconv-0.1.1-py3-none-any.whl (44.3 kB view details)

Uploaded Dec 13, 2025 Python 3

File details

Details for the file flashdeconv-0.1.1.tar.gz.

File metadata

Download URL: flashdeconv-0.1.1.tar.gz
Upload date: Dec 13, 2025
Size: 52.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flashdeconv-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5876a815891d58ddbb363d3841df683b0ffaf52ae8a26bb2ee00c15d8eb8b2ad`
MD5	`1d996ef864bd160a6399e3f681782bf4`
BLAKE2b-256	`3c55c0c039dbbfbe4299d34517580b86bb8e4b5f3ed406a755d0943e1d9b1c9d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flashdeconv-0.1.1.tar.gz:

Publisher: publish.yml on cafferychen777/flashdeconv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flashdeconv-0.1.1.tar.gz
- Subject digest: 5876a815891d58ddbb363d3841df683b0ffaf52ae8a26bb2ee00c15d8eb8b2ad
- Sigstore transparency entry: 763547202
- Sigstore integration time: Dec 13, 2025
Source repository:
- Permalink: cafferychen777/flashdeconv@b33a75a6bcfa35d23e3e35ab96076ef5ab5cad8d
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/cafferychen777
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b33a75a6bcfa35d23e3e35ab96076ef5ab5cad8d
- Trigger Event: release

File details

Details for the file flashdeconv-0.1.1-py3-none-any.whl.

File metadata

Download URL: flashdeconv-0.1.1-py3-none-any.whl
Upload date: Dec 13, 2025
Size: 44.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flashdeconv-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c86f7816935ddff0da0058f338dbb8dab85a7724823fab68d5c295a761037c33`
MD5	`679b1fa5fd03b0cc657354cf36edf2cd`
BLAKE2b-256	`58daad2225973f936d02e22b22600d95ec017425a43c9c548d3adf58b7891194`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flashdeconv-0.1.1-py3-none-any.whl:

Publisher: publish.yml on cafferychen777/flashdeconv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flashdeconv-0.1.1-py3-none-any.whl
- Subject digest: c86f7816935ddff0da0058f338dbb8dab85a7724823fab68d5c295a761037c33
- Sigstore transparency entry: 763547205
- Sigstore integration time: Dec 13, 2025
Source repository:
- Permalink: cafferychen777/flashdeconv@b33a75a6bcfa35d23e3e35ab96076ef5ab5cad8d
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/cafferychen777
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b33a75a6bcfa35d23e3e35ab96076ef5ab5cad8d
- Trigger Event: release

flashdeconv 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FlashDeconv

Key Features

Installation

Quick Start

1. The NumPy Way

2. The Scanpy/AnnData Way

Best Practices: Tuning lambda_spatial

Algorithm Under the Hood

Three-Stage Framework

Why This Works

Benchmarks

API Reference

FlashDeconv Class

Parameters

Attributes (After Fitting)

Input Data Formats

Spatial Data (Y)

Reference (X)

Coordinates

Citation

Contributing

License

Related Resources

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Best Practices: Tuning `lambda_spatial`