High-Performance Parallel Differential Expression Analysis for Single-Cell Data
Project description
hpdex
High-performance differential expression analysis for single-cell data
Overview · Installation · Quick Start · API · Kernels · Benchmarks · FAQ · License
🔎 Overview
hpdex now ships a C++ backend with careful memory layout and true multithreading, delivering large speedups for Mann–Whitney U–based DE on single-cell matrices:
- ⚡ Sparse CSC: in DE scenarios (gene × group pairs), the pure Mann–Whitney U kernel is often hundreds of times faster than SciPy when multi-threaded.
- 🧩 End-to-end DE (
parallel_differential_expression): with scheduling and reuse, typical pipelines are 2–3 orders of magnitude faster than pdex-style Python loops. - 📦 Dense: even single-thread is commonly ~3× faster than SciPy on dense inputs; multi-thread scales further.
- 📈 Statistical parity: tie-aware U, continuity correction, and asymptotic p-values match
scipy.stats.mannwhitneyuunder equivalent settings (within float tolerance).
Key capabilities:
- 🧵 C++ multithreading, per-thread workspaces, low allocator pressure
- 🧮 Float & histogram kernels (auto-selected by data type)
- 🧠 Tie & continuity corrections; exact or asymptotic p-values in backend
- 🧱 Robust sparse support (CSC highly recommended)
- 🧰 Drop-in friendly design & clean Python API
⚙️ Installation
From PyPI
pip install hpdex
From source
git clone --recurse-submodules https://github.com/AI4Cell/hpdex.git # for highway submodule
cd hpdex
pip install -e .
Requirements
- Python ≥ 3.10
numpy,scipy,pandas,anndata,tqdm- Building from source requires a C++17 compiler; OpenMP recommended for threading
🚀 Quick Start
import anndata as ad
from hpdex import parallel_differential_expression
adata = ad.read_h5ad("data.h5ad")
df = parallel_differential_expression(
adata,
groupby_key="perturbation",
reference="control",
threads=8, # C++ multithreading
tie_correction=True,
use_continuity=True, # continuity correction for U -> Z
show_progress=True,
)
df.to_csv("de_results.csv", index=False)
Output schema (DataFrame):
| column | description |
|---|---|
target |
target group name |
feature |
gene / feature id |
p_value |
two-sided p-value from Mann–Whitney U |
u_statistic |
U₁ statistic (reference vs target) |
fold_change |
mean(target) / mean(reference) |
log2_fold_change |
log2(fold_change) |
fdr |
BH-adjusted p-value |
📚 API Reference
parallel_differential_expression
parallel_differential_expression(
adata: ad.AnnData,
groupby_key: str,
reference: str | None, # None -> treated as "non-targeting" baseline
groups: list[str] | None = None,
metric: str = "wilcoxon", # currently only "wilcoxon" (Mann–Whitney U)
tie_correction: bool = True,
use_continuity: bool = True,
min_samples: int = 2,
threads: int = -1, # -1 -> use all logical cores
clip_value: float = 20.0,
show_progress: bool = True,
) -> pd.DataFrame
Notes
threadscontrols the C++ backend concurrency.reference=Nonecreates a “non-targeting” baseline for missing labels.- Internally the matrix is converted to CSC (
scipy.sparse.csc_matrix) for best performance. - The backend currently defaults to zero-handling = "min" (zeros at head), suitable for UMI counts.
Low-level backend (advanced)
from hpdex.backend import mannwhitneyu, group_mean
U1, P = mannwhitneyu(
matrix_csc, # scipy.sparse.csc_matrix or ndarray (dense handled internally)
group_id_int32, # shape: (n_cells,), 0=reference, 1..G-1=targets, -1=ignored
n_targets: int,
ref_sorted=False, # set True if your ref slice is already sorted
tar_sorted=False, # set True if target slices are sorted
use_continuity=True,
tie_correction=True,
zero_handling="min", # "none" | "min" | "max" | "mix"
threads=-1,
show_progress=False,
)
means = group_mean(
matrix_csc,
group_id_int32,
G, # total groups (reference + targets)
include_zeros=True,
threads=-1,
) # returns array shape: (G, n_genes)
🧪 Statistical Kernels
Float kernel
- Merge-rank U with tie tracking; contiguous slices; vector-friendly
- Memory:
O(n)per slice; scales well with threads
Histogram kernel
- For integer/UMI counts with limited range
- Bucketized ranks reduce sorting; memory
O(bins)(≪ n)
Common
- Exact/asymptotic p-values in backend
- Tie & continuity corrections
- Batch-wise scheduling across (gene × group)
📈 Benchmarks
Typical observations (indicative; varies by CPU, threads, sparsity):
| Scenario | Matrix | Speedup vs SciPy |
|---|---|---|
| Dense single-thread U per gene | dense | ~3× |
| Sparse multi-thread U (DE setting) | CSC, 64T | 100×–300× |
| End-to-end DE vs pdex-style baseline | CSC, 64T | 10²–10³× |
Why fast? Thread-local workspaces (no per-column allocations), cache-friendly slices, zero-aware rank merge, and reduced Python overhead.
🧷 Testing
cd test && cat bench_mwu.txt | xargs python bench_mwu.py
❓ FAQ
Do you correct for multiple testing?
Yes. The outputfdr applies Benjamini–Hochberg (BH).
Why are p-values sometimes ~0?
Very large samples and strong effects can underflow in float. This is expected; preferfdr for decisions.
How are zeros handled?
Backend supports"none", "min", "max", "mix". The high-level API currently uses "min" by default, which is suitable for UMI counts.
📄 License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hpdex-1.0.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 148.1 kB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
420add41a252ad89be3afdbfd1dff3b100d631d8b0d0d704896601274b95ecee
|
|
| MD5 |
018c3bd51e8071c7da289e790f012082
|
|
| BLAKE2b-256 |
46502d64f6cc7430be3ce4ef70bc90aec83ce2412e8330b1056509887ec46c0d
|
File details
Details for the file hpdex-1.0.0-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 303.5 kB
- Tags: CPython 3.13, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad55fc2658c2a0d273caadb1fe734bb2d8702a293e80caae9236d8d3d9093708
|
|
| MD5 |
94126dc033f3df972baa6975da815efd
|
|
| BLAKE2b-256 |
28e2e18dc18ed2cbed3700c3bd8fd1e2eb03e93f36a9282258a48141b934acc1
|
File details
Details for the file hpdex-1.0.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 148.1 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a863d369e3b4aa87a920406f0d5a13e5f284c7a94f84da5ee9bb7ee35d9cae9c
|
|
| MD5 |
f71ba22ded25cd2fe382e8d576536c83
|
|
| BLAKE2b-256 |
decc62c8f711cf1296180fbf43928b6c4956f2850c93d2b2429f786b78a6552b
|
File details
Details for the file hpdex-1.0.0-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 303.1 kB
- Tags: CPython 3.12, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d11d398e9b07d786665c9326606ae1b9f0c39dce41f6b4e9d749d151ff7bcc01
|
|
| MD5 |
14bd79c1648fe6b1736928590ae52e8b
|
|
| BLAKE2b-256 |
d046ec8a938ead257b760cddfc88ebee3da62b83777e0b058a857cfcbdbd39ef
|
File details
Details for the file hpdex-1.0.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 146.5 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22e5071bf571b8ca6a2c6a950f7f65aa40912a4359003bcf481326f64220e457
|
|
| MD5 |
9086d1137f0d7b49505c4f1f722eacbc
|
|
| BLAKE2b-256 |
23bcb36e1bb48cc1dd8e5f04afabd1894983251192a539ae0922d39fd9136270
|
File details
Details for the file hpdex-1.0.0-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 295.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f15165d265600cd571ae3d13f1036ec17170353ab9c7ac0c11c61a9860a2d130
|
|
| MD5 |
f1914c216ba95293d0e1a0c6ab2ed35a
|
|
| BLAKE2b-256 |
aea901e4c97a58fdf47ca8b31b8902bfbe3c0ab8c54836f82e8aad12fb556f7d
|
File details
Details for the file hpdex-1.0.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 145.9 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d4282ca9ea184eb42eb0887c7a6919c7a801329e28fb16c728f2a55fc2d1dd9
|
|
| MD5 |
0dbb243e644cabd204ae3d81cace18ed
|
|
| BLAKE2b-256 |
ba51cd3e94aa2d388fb1c61c51908e379e033fd1cb3c1c4bd8b4a8f813876f15
|
File details
Details for the file hpdex-1.0.0-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: hpdex-1.0.0-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 298.5 kB
- Tags: CPython 3.10, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7d33d9ab3af87c0c1406d4d936869fa02a4f225fd3f85148bce4ca500163f42
|
|
| MD5 |
52db7bb78546a829fa0dd21508e53d72
|
|
| BLAKE2b-256 |
15698cb1bfc224824fe4f24300b249285f9e60a34383504d952196a8955e8e08
|