Skip to main content

High-Performance Parallel Differential Expression Analysis for Single-Cell Data

Project description

hpdex

High-performance differential expression analysis for single-cell data

PyPI version Python 3.10+ MIT License Status: stable C++ backend

Overview · Installation · Quick Start · API · Kernels · Benchmarks · FAQ · License


🔎 Overview

hpdex now ships a C++ backend with careful memory layout and true multithreading, delivering large speedups for Mann–Whitney U–based DE on single-cell matrices:

  • Sparse CSC: in DE scenarios (gene × group pairs), the pure Mann–Whitney U kernel is often hundreds of times faster than SciPy when multi-threaded.
  • 🧩 End-to-end DE (parallel_differential_expression): with scheduling and reuse, typical pipelines are 2–3 orders of magnitude faster than pdex-style Python loops.
  • 📦 Dense: even single-thread is commonly ~3× faster than SciPy on dense inputs; multi-thread scales further.
  • 📈 Statistical parity: tie-aware U, continuity correction, and asymptotic p-values match scipy.stats.mannwhitneyu under equivalent settings (within float tolerance).

Key capabilities:

  • 🧵 C++ multithreading, per-thread workspaces, low allocator pressure
  • 🧮 Float & histogram kernels (auto-selected by data type)
  • 🧠 Tie & continuity corrections; exact or asymptotic p-values in backend
  • 🧱 Robust sparse support (CSC highly recommended)
  • 🧰 Drop-in friendly design & clean Python API

⚙️ Installation

From PyPI

pip install hpdex

From source

git clone --recurse-submodules https://github.com/AI4Cell/hpdex.git # for highway submodule
cd hpdex
pip install -e .
Requirements
  • Python ≥ 3.10
  • numpy, scipy, pandas, anndata, tqdm
  • Building from source requires a C++17 compiler; OpenMP recommended for threading

🚀 Quick Start

import anndata as ad
from hpdex import parallel_differential_expression

adata = ad.read_h5ad("data.h5ad")

df = parallel_differential_expression(
    adata,
    groupby_key="perturbation",
    reference="control",
    threads=8,                 # C++ multithreading
    tie_correction=True,
    use_continuity=True,       # continuity correction for U -> Z
    show_progress=True,
)

df.to_csv("de_results.csv", index=False)

Output schema (DataFrame):

column description
target target group name
feature gene / feature id
p_value two-sided p-value from Mann–Whitney U
u_statistic U₁ statistic (reference vs target)
fold_change mean(target) / mean(reference)
log2_fold_change log2(fold_change)
fdr BH-adjusted p-value

📚 API Reference

parallel_differential_expression

parallel_differential_expression(
    adata: ad.AnnData,
    groupby_key: str,
    reference: str | None,          # None -> treated as "non-targeting" baseline
    groups: list[str] | None = None,
    metric: str = "wilcoxon",       # currently only "wilcoxon" (Mann–Whitney U)
    tie_correction: bool = True,
    use_continuity: bool = True,
    min_samples: int = 2,
    threads: int = -1,              # -1 -> use all logical cores
    clip_value: float = 20.0,
    show_progress: bool = True,
) -> pd.DataFrame

Notes

  • threads controls the C++ backend concurrency.
  • reference=None creates a “non-targeting” baseline for missing labels.
  • Internally the matrix is converted to CSC (scipy.sparse.csc_matrix) for best performance.
  • The backend currently defaults to zero-handling = "min" (zeros at head), suitable for UMI counts.

Low-level backend (advanced)

from hpdex.backend import mannwhitneyu, group_mean

U1, P = mannwhitneyu(
    matrix_csc,           # scipy.sparse.csc_matrix or ndarray (dense handled internally)
    group_id_int32,       # shape: (n_cells,), 0=reference, 1..G-1=targets, -1=ignored
    n_targets: int,
    ref_sorted=False,     # set True if your ref slice is already sorted
    tar_sorted=False,     # set True if target slices are sorted
    use_continuity=True,
    tie_correction=True,
    zero_handling="min",  # "none" | "min" | "max" | "mix"
    threads=-1,
    show_progress=False,
)

means = group_mean(
    matrix_csc,
    group_id_int32,
    G,                    # total groups (reference + targets)
    include_zeros=True,
    threads=-1,
)  # returns array shape: (G, n_genes)

🧪 Statistical Kernels

Float kernel

  • Merge-rank U with tie tracking; contiguous slices; vector-friendly
  • Memory: O(n) per slice; scales well with threads

Histogram kernel

  • For integer/UMI counts with limited range
  • Bucketized ranks reduce sorting; memory O(bins) (≪ n)

Common

  • Exact/asymptotic p-values in backend
  • Tie & continuity corrections
  • Batch-wise scheduling across (gene × group)

📈 Benchmarks

Typical observations (indicative; varies by CPU, threads, sparsity):

Scenario Matrix Speedup vs SciPy
Dense single-thread U per gene dense ~3×
Sparse multi-thread U (DE setting) CSC, 64T 100×–300×
End-to-end DE vs pdex-style baseline CSC, 64T 10²–10³×

Why fast? Thread-local workspaces (no per-column allocations), cache-friendly slices, zero-aware rank merge, and reduced Python overhead.


🧷 Testing

cd test && cat bench_mwu.txt | xargs python bench_mwu.py

❓ FAQ

Do you correct for multiple testing? Yes. The output fdr applies Benjamini–Hochberg (BH).
Why are p-values sometimes ~0? Very large samples and strong effects can underflow in float. This is expected; prefer fdr for decisions.
How are zeros handled? Backend supports "none", "min", "max", "mix". The high-level API currently uses "min" by default, which is suitable for UMI counts.

📄 License

MIT — see LICENSE.


Built for large-scale single-cell perturbation analysis, now powered by a C++ backend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hpdex-1.0.0-cp313-cp313-win_amd64.whl (148.1 kB view details)

Uploaded CPython 3.13Windows x86-64

hpdex-1.0.0-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (303.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

hpdex-1.0.0-cp312-cp312-win_amd64.whl (148.1 kB view details)

Uploaded CPython 3.12Windows x86-64

hpdex-1.0.0-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (303.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

hpdex-1.0.0-cp311-cp311-win_amd64.whl (146.5 kB view details)

Uploaded CPython 3.11Windows x86-64

hpdex-1.0.0-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (295.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

hpdex-1.0.0-cp310-cp310-win_amd64.whl (145.9 kB view details)

Uploaded CPython 3.10Windows x86-64

hpdex-1.0.0-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (298.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file hpdex-1.0.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 148.1 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 420add41a252ad89be3afdbfd1dff3b100d631d8b0d0d704896601274b95ecee
MD5 018c3bd51e8071c7da289e790f012082
BLAKE2b-256 46502d64f6cc7430be3ce4ef70bc90aec83ce2412e8330b1056509887ec46c0d

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.0-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ad55fc2658c2a0d273caadb1fe734bb2d8702a293e80caae9236d8d3d9093708
MD5 94126dc033f3df972baa6975da815efd
BLAKE2b-256 28e2e18dc18ed2cbed3700c3bd8fd1e2eb03e93f36a9282258a48141b934acc1

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 148.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a863d369e3b4aa87a920406f0d5a13e5f284c7a94f84da5ee9bb7ee35d9cae9c
MD5 f71ba22ded25cd2fe382e8d576536c83
BLAKE2b-256 decc62c8f711cf1296180fbf43928b6c4956f2850c93d2b2429f786b78a6552b

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.0-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d11d398e9b07d786665c9326606ae1b9f0c39dce41f6b4e9d749d151ff7bcc01
MD5 14bd79c1648fe6b1736928590ae52e8b
BLAKE2b-256 d046ec8a938ead257b760cddfc88ebee3da62b83777e0b058a857cfcbdbd39ef

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 146.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 22e5071bf571b8ca6a2c6a950f7f65aa40912a4359003bcf481326f64220e457
MD5 9086d1137f0d7b49505c4f1f722eacbc
BLAKE2b-256 23bcb36e1bb48cc1dd8e5f04afabd1894983251192a539ae0922d39fd9136270

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.0-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f15165d265600cd571ae3d13f1036ec17170353ab9c7ac0c11c61a9860a2d130
MD5 f1914c216ba95293d0e1a0c6ab2ed35a
BLAKE2b-256 aea901e4c97a58fdf47ca8b31b8902bfbe3c0ab8c54836f82e8aad12fb556f7d

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 145.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 9d4282ca9ea184eb42eb0887c7a6919c7a801329e28fb16c728f2a55fc2d1dd9
MD5 0dbb243e644cabd204ae3d81cace18ed
BLAKE2b-256 ba51cd3e94aa2d388fb1c61c51908e379e033fd1cb3c1c4bd8b4a8f813876f15

See more details on using hashes here.

File details

Details for the file hpdex-1.0.0-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.0-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e7d33d9ab3af87c0c1406d4d936869fa02a4f225fd3f85148bce4ca500163f42
MD5 52db7bb78546a829fa0dd21508e53d72
BLAKE2b-256 15698cb1bfc224824fe4f24300b249285f9e60a34383504d952196a8955e8e08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page