Fast Linear Algebra for Scalable Hybrid Deconvolution of Spatial Transcriptomics
Project description
FlashDeconv
Fast Linear Algebra for Scalable Hybrid Deconvolution
FlashDeconv is a high-performance spatial transcriptomics deconvolution method that estimates cell type proportions from spatial gene expression data using single-cell reference signatures.
Key Features
- Ultra-fast: Process 1 million spots in ~3 minutes on CPU
- Memory-efficient: O(N) linear scaling via structure-preserving sketching
- No GPU required: Runs on commodity hardware (32GB RAM sufficient for 1M spots)
- Statistically rigorous: Log-CPM normalization with leverage-weighted gene selection
- Spatially-aware: Sparse graph Laplacian regularization for spatial coherence
- Rare cell detection: Leverage scores prioritize marker genes over high-variance genes
Installation
# From source
git clone https://github.com/cafferychen777/flashdeconv.git
cd flashdeconv
pip install -e .
# With development dependencies
pip install -e ".[dev]"
# With scanpy integration
pip install -e ".[scanpy]"
Quick Start
from flashdeconv import FlashDeconv
# Initialize model
model = FlashDeconv(
sketch_dim=512, # Sketch dimension (default: 512)
lambda_spatial="auto", # Spatial regularization (auto-tuned)
rho_sparsity=0.01, # Sparsity regularization
)
# Fit and get cell type proportions
proportions = model.fit_transform(Y, X, coords)
# Y: spatial count matrix (n_spots x n_genes)
# X: reference signatures (n_cell_types x n_genes)
# coords: spatial coordinates (n_spots x 2)
With AnnData
from flashdeconv import FlashDeconv
from flashdeconv.io import prepare_data, result_to_anndata
# Prepare data from AnnData objects
Y, X, coords, cell_type_names, gene_names = prepare_data(
adata_st, # Spatial AnnData
adata_ref, # Single-cell reference AnnData
cell_type_key="cell_type",
)
# Run deconvolution
model = FlashDeconv(verbose=True)
proportions = model.fit_transform(Y, X, coords, cell_type_names=cell_type_names)
# Store results back in AnnData
adata_st = result_to_anndata(proportions, adata_st, cell_type_names)
# Access results
adata_st.obsm["flashdeconv"] # Cell type proportions DataFrame
adata_st.obs["flashdeconv_dominant"] # Dominant cell type per spot
Method Overview
FlashDeconv introduces a three-stage framework:
1. Gene Selection & Preprocessing
- Leverage-weighted Gene Selection: Selects informative genes using leverage scores that prioritize cell-type-specific markers over high-variance genes, enabling accurate detection of rare cell populations.
- Log-CPM Normalization: Default preprocessing that normalizes counts per million and applies log1p transformation for variance stabilization.
2. Structure-Preserving Sketching
- CountSketch with Importance Sampling: Compresses gene dimension (~20,000) to sketch space (~512) using sparse random projections weighted by leverage scores.
- Theoretical Guarantees: Preserves distance relationships via Johnson-Lindenstrauss lemma, ensuring rare cell type markers are retained with high probability.
3. Spatial Graph Regularized Optimization
- Sparse Graph Laplacian: O(N) memory complexity via k-NN graph construction, enabling million-scale analysis without dense covariance matrices.
- Block Coordinate Descent (BCD): Numba-accelerated solver with closed-form updates and non-negativity constraints for extreme speed.
Parameters
| Parameter | Default | Description |
|---|---|---|
sketch_dim |
512 | Dimension of sketch space |
lambda_spatial |
5000.0 | Spatial regularization strength (use "auto" for automatic tuning) |
rho_sparsity |
0.01 | L1 sparsity regularization |
n_hvg |
2000 | Number of highly variable genes |
n_markers_per_type |
50 | Markers per cell type |
k_neighbors |
6 | Neighbors for spatial graph |
max_iter |
100 | Maximum BCD iterations |
tol |
1e-4 | Convergence tolerance |
preprocess |
"log_cpm" | Preprocessing method: "log_cpm", "pearson", or "raw" |
Benchmarks
| Dataset Size | FlashDeconv Runtime | Memory |
|---|---|---|
| 10K spots | < 1 sec | < 1 GB |
| 100K spots | ~4 sec | ~2 GB |
| 1M spots | ~3 min | ~21 GB |
Benchmarks on Apple MacBook Pro M2 Max with 32GB unified memory (no GPU required). FlashDeconv exhibits O(N) linear scaling for both time and memory.
API Reference
FlashDeconv Class
class FlashDeconv:
def __init__(
self,
sketch_dim=512,
lambda_spatial=5000.0, # or "auto" for automatic tuning
rho_sparsity=0.01,
n_hvg=2000,
n_markers_per_type=50,
spatial_method="knn",
k_neighbors=6,
max_iter=100,
tol=1e-4,
preprocess="log_cpm", # "log_cpm", "pearson", or "raw"
random_state=None,
verbose=False,
): ...
def fit(self, Y, X, coords, gene_names=None, cell_type_names=None) -> self
def fit_transform(self, Y, X, coords, **kwargs) -> np.ndarray
def get_cell_type_proportions(self) -> np.ndarray
def get_abundances(self) -> np.ndarray
def get_dominant_cell_type(self) -> np.ndarray
def summary(self) -> dict
Attributes (after fitting)
beta_: Raw cell type abundances (n_spots, n_cell_types)proportions_: Normalized proportions that sum to 1 (n_spots, n_cell_types)gene_idx_: Indices of genes used for deconvolutionlambda_used_: Actual spatial regularization value usedinfo_: Optimization information (converged, n_iterations, final_objective)cell_type_names_: Names of cell types (if provided)
Citation
If you use FlashDeconv in your research, please cite:
@article{flashdeconv2024,
title={FlashDeconv: Fast Linear Algebra for Scalable Hybrid Deconvolution},
author={FlashDeconv Team},
journal={bioRxiv},
year={2024}
}
License
GPL-3.0 License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flashdeconv-0.1.0.tar.gz.
File metadata
- Download URL: flashdeconv-0.1.0.tar.gz
- Upload date:
- Size: 46.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0e5d750e86373a0e8cd45e96eeb0f091a855efaff419fa280c4ce616da43d7e
|
|
| MD5 |
bb6d0bb0a273e3c9f9292e9992e0d20e
|
|
| BLAKE2b-256 |
4f0719837177d9adc9972ef6d5638919d7ec5186a4c17e2ca0ef3e661295acb1
|
Provenance
The following attestation bundles were made for flashdeconv-0.1.0.tar.gz:
Publisher:
publish.yml on cafferychen777/flashdeconv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flashdeconv-0.1.0.tar.gz -
Subject digest:
c0e5d750e86373a0e8cd45e96eeb0f091a855efaff419fa280c4ce616da43d7e - Sigstore transparency entry: 763357199
- Sigstore integration time:
-
Permalink:
cafferychen777/flashdeconv@24296c3c4b86fe0b10b37130582e7b69cabb9c3b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cafferychen777
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24296c3c4b86fe0b10b37130582e7b69cabb9c3b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file flashdeconv-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flashdeconv-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2f69d0f132770cdcf8f451ef5cc466a61e42492d52944f11b5864687064e2b8
|
|
| MD5 |
b34960055e962161dcf5ee1b6ad58083
|
|
| BLAKE2b-256 |
4f255de59706d727f8f6a0c5f53f44119c5bd2409ae3a64660cf0ab45b29cb4c
|
Provenance
The following attestation bundles were made for flashdeconv-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on cafferychen777/flashdeconv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flashdeconv-0.1.0-py3-none-any.whl -
Subject digest:
d2f69d0f132770cdcf8f451ef5cc466a61e42492d52944f11b5864687064e2b8 - Sigstore transparency entry: 763357200
- Sigstore integration time:
-
Permalink:
cafferychen777/flashdeconv@24296c3c4b86fe0b10b37130582e7b69cabb9c3b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cafferychen777
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24296c3c4b86fe0b10b37130582e7b69cabb9c3b -
Trigger Event:
workflow_dispatch
-
Statement type: