Skip to main content

High-Performance Parallel Differential Expression Analysis for Single-Cell Data

Project description

hpdex

High-performance differential expression analysis for single-cell data

PyPI version Python 3.10+ MIT License Status: stable C++ backend

Overview · Installation · Quick Start · API · Kernels · Benchmarks · FAQ · License


🔎 Overview

hpdex now ships a C++ backend with careful memory layout and true multithreading, delivering large speedups for Mann–Whitney U–based DE on single-cell matrices:

  • Sparse CSC: in DE scenarios (gene × group pairs), the pure Mann–Whitney U kernel is often hundreds of times faster than SciPy when multi-threaded.
  • 🧩 End-to-end DE (parallel_differential_expression): with scheduling and reuse, typical pipelines are 2–3 orders of magnitude faster than pdex-style Python loops.
  • 📦 Dense: even single-thread is commonly ~3× faster than SciPy on dense inputs; multi-thread scales further.
  • 📈 Statistical parity: tie-aware U, continuity correction, and asymptotic p-values match scipy.stats.mannwhitneyu under equivalent settings (within float tolerance).

Key capabilities:

  • 🧵 C++ multithreading, per-thread workspaces, low allocator pressure
  • 🧮 Float & histogram kernels (auto-selected by data type)
  • 🧠 Tie & continuity corrections; exact or asymptotic p-values in backend
  • 🧱 Robust sparse support (CSC highly recommended)
  • 🧰 Drop-in friendly design & clean Python API

⚙️ Installation

From PyPI

pip install hpdex

From source

git clone --recurse-submodules https://github.com/AI4Cell/hpdex.git # for highway submodule
cd hpdex
pip install -e .
Requirements
  • Python ≥ 3.10
  • numpy, scipy, pandas, anndata, tqdm
  • Building from source requires a C++17 compiler; OpenMP recommended for threading

🚀 Quick Start

import anndata as ad
from hpdex import parallel_differential_expression

adata = ad.read_h5ad("data.h5ad")

df = parallel_differential_expression(
    adata,
    groupby_key="perturbation",
    reference="control",
    threads=8,                 # C++ multithreading
    tie_correction=True,
    use_continuity=True,       # continuity correction for U -> Z
    show_progress=True,
)

df.to_csv("de_results.csv", index=False)

Output schema (DataFrame):

column description
target target group name
feature gene / feature id
p_value two-sided p-value from Mann–Whitney U
u_statistic U₁ statistic (reference vs target)
fold_change mean(target) / mean(reference)
log2_fold_change log2(fold_change)
fdr BH-adjusted p-value

📚 API Reference

parallel_differential_expression

parallel_differential_expression(
    adata: ad.AnnData,
    groupby_key: str,
    reference: str | None,          # None -> treated as "non-targeting" baseline
    groups: list[str] | None = None,
    metric: str = "wilcoxon",       # currently only "wilcoxon" (Mann–Whitney U)
    tie_correction: bool = True,
    use_continuity: bool = True,
    min_samples: int = 2,
    threads: int = -1,              # -1 -> use all logical cores
    clip_value: float = 20.0,
    show_progress: bool = True,
) -> pd.DataFrame

Notes

  • threads controls the C++ backend concurrency.
  • reference=None creates a “non-targeting” baseline for missing labels.
  • Internally the matrix is converted to CSC (scipy.sparse.csc_matrix) for best performance.
  • The backend currently defaults to zero-handling = "min" (zeros at head), suitable for UMI counts.

Low-level backend (advanced)

from hpdex.backend import mannwhitneyu, group_mean

U1, P = mannwhitneyu(
    matrix_csc,           # scipy.sparse.csc_matrix or ndarray (dense handled internally)
    group_id_int32,       # shape: (n_cells,), 0=reference, 1..G-1=targets, -1=ignored
    n_targets: int,
    ref_sorted=False,     # set True if your ref slice is already sorted
    tar_sorted=False,     # set True if target slices are sorted
    use_continuity=True,
    tie_correction=True,
    zero_handling="min",  # "none" | "min" | "max" | "mix"
    threads=-1,
    show_progress=False,
)

means = group_mean(
    matrix_csc,
    group_id_int32,
    G,                    # total groups (reference + targets)
    include_zeros=True,
    threads=-1,
)  # returns array shape: (G, n_genes)

🧪 Statistical Kernels

Float kernel

  • Merge-rank U with tie tracking; contiguous slices; vector-friendly
  • Memory: O(n) per slice; scales well with threads

Histogram kernel

  • For integer/UMI counts with limited range
  • Bucketized ranks reduce sorting; memory O(bins) (≪ n)

Common

  • Exact/asymptotic p-values in backend
  • Tie & continuity corrections
  • Batch-wise scheduling across (gene × group)

📈 Benchmarks

Typical observations (indicative; varies by CPU, threads, sparsity):

Scenario Matrix Speedup vs SciPy
Dense single-thread U per gene dense ~3×
Sparse multi-thread U (DE setting) CSC, 64T 100×–300×
End-to-end DE vs pdex-style baseline CSC, 64T 10²–10³×

Why fast? Thread-local workspaces (no per-column allocations), cache-friendly slices, zero-aware rank merge, and reduced Python overhead.


🧷 Testing

cd test && cat bench_mwu.txt | xargs python bench_mwu.py

❓ FAQ

Do you correct for multiple testing? Yes. The output fdr applies Benjamini–Hochberg (BH).
Why are p-values sometimes ~0? Very large samples and strong effects can underflow in float. This is expected; prefer fdr for decisions.
How are zeros handled? Backend supports "none", "min", "max", "mix". The high-level API currently uses "min" by default, which is suitable for UMI counts.

📄 License

MIT — see LICENSE.


Built for large-scale single-cell perturbation analysis, now powered by a C++ backend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hpdex-1.0.1-cp313-cp313-win_amd64.whl (150.7 kB view details)

Uploaded CPython 3.13Windows x86-64

hpdex-1.0.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (307.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

hpdex-1.0.1-cp312-cp312-win_amd64.whl (150.7 kB view details)

Uploaded CPython 3.12Windows x86-64

hpdex-1.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (312.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

hpdex-1.0.1-cp311-cp311-win_amd64.whl (149.4 kB view details)

Uploaded CPython 3.11Windows x86-64

hpdex-1.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (307.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

hpdex-1.0.1-cp310-cp310-win_amd64.whl (149.4 kB view details)

Uploaded CPython 3.10Windows x86-64

hpdex-1.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (308.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file hpdex-1.0.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 150.7 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d3c5ae0d3246f3e3d9ca4c8949ec015625778810ed03b088c67f67152bd2d696
MD5 52b1922465324285d5178299e04354d5
BLAKE2b-256 11a158928122ca563d24544ff81841dbbf5704091232964d82f4999d4ff0b912

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6968eac368a1f0364d9fd1349c1044a4c0e65ff79948e890503ca976696ffe6a
MD5 9b54dd74c40889e8963ec93f18e448cf
BLAKE2b-256 305ac16de3f6157ad54082a6dfe9234aaf5aade7b954f9aa7bb7c8d51cb168e6

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 150.7 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 16fff3a00b5ce398be7f3ecde9a1f69ee65f053b6a08ca56d4f15581e72ab080
MD5 04b8abb3676ab3d894aef2fd3a3fa316
BLAKE2b-256 98c06d7d07f62d60b9fcd199b87dca0cdff43eaf40364ae47cc775621bbed8c4

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cdaa83ba0f9a1d320eb73195312006eb51a031ca0e0e695205fe5e8597e3ef80
MD5 810067c934418ced337fe5d2481011b8
BLAKE2b-256 acb7e624003b3acdae623ec0a95f8dec3b1eeb3bae0c2f4b881909ae63e87802

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 149.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3cd1ae5808198dbd1c73c6534090837a1de6ab2dcce2b6e302a8338fbb6039c0
MD5 20d3edb4888250e4d982670382dd3a1c
BLAKE2b-256 e4dc901ef5af9a0b39b8e7f4386bc445d46595b44901cf9f34021329f2b80f23

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 27f7ffe5674554ea459f1787606fcaf8d04abe024c5f6d667ebaa64373048566
MD5 0263454f287af684efd18039753426e3
BLAKE2b-256 99dbfebb8674d65e3a2a803fb469dcdc0dde327e9ee0bf1e792cfe47d06f5d2b

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: hpdex-1.0.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 149.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpdex-1.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f75fd0fc578a22297af4d48dc4da196fa719819d3a190d8277856b472787f081
MD5 d98f52c308b7e0b4a13a999ded3f38e3
BLAKE2b-256 776823bf9a8cb16afd30438d7d5f9c3bd97e33377af90edccb4d600e0311b3e2

See more details on using hashes here.

File details

Details for the file hpdex-1.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hpdex-1.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ca78a57caae089e16ba4b9affb6684d95a3eff5a538dfe5ac8cedab0fcdea508
MD5 09d7d199df7ad3f4e421eb63d3c8d28e
BLAKE2b-256 e4fcafeffbd4d965947770a94879157d098d030fc6c0521a09e3b4ba85185caf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page