Parallel differential expression for single-cell perturbation sequencing
Project description
pdex
parallel differential expression for single-cell perturbation sequencing
Installation
Add to your pyproject.toml file with uv
uv add pdex
Summary
This is a python package for performing parallel differential expression between multiple groups and a control.
It is optimized for very large datasets and very large numbers of perturbations.
It makes use of shared memory to parallelize the computation to a high number of threads and minimizes the IPC between processes to reduce overhead.
It supports the following metrics:
- Wilcoxon Rank Sum
- Anderson-Darling
- T-Test
Backed vs In-Memory AnnData
pdex adapts its execution strategy based on how the AnnData object is stored:
- In-memory AnnData (e.g., loaded via
sc.read_h5ad(path)withoutbacked="r"): pdex uses a shared-memory multiprocessing workflow. Each worker process gets access to the full expression matrix through shared memory, which minimizes serialization overhead. Parallelism is configured via thenum_workersparameter (process count).num_threadsis ignored in this mode because numba kernels operate on per-target slices entirely in memory. - Backed AnnData (
adata.Xis an on-disk HDF5 dataset): pdex automatically switches to the low-memory chunked implementation. Gene chunks are streamed from disk, the reference group is computed once per chunk, and targets are processed in parallel vianum_workers(thread pool). Within each target, Wilcoxon metrics can additionally use numba parallelization controlled bynum_threads. This mode avoids loading the entire matrix into RAM while still enabling both target-level and gene-level parallelism.
If a backed dataset is supplied without enabling low-memory mode, pdex raises a
helpful error explaining that chunked processing is required. Conversely, you can
force the chunked path for large in-memory matrices by passing low_memory=True.
Parallelization
parallel_differential_expression exposes two orthogonal knobs for controlling
parallel execution:
num_workerscontrols the number of Python threads that process targets within each gene chunk.None(default in low-memory mode) enables an auto-detected worker count based on available CPUs, while1disables thread-level parallelism.num_threadscontrols the numba thread pool used by the Wilcoxon kernel.Nonelets numba auto-detect the optimal size, whereas1turns numba parallelization off. This setting is only used in low-memory mode and only whenmetric="wilcoxon". When pdex detects non-integer expression values in a gene chunk (for example, after log-normalization), it automatically disables numba for that chunk, logs a warning, and falls back to the SciPy implementation to preserve correct rank ordering.
These strategies can be combined: for example, num_workers=2, num_threads=8
runs two target threads that share an eight-thread numba pool. When the metric
does not support numba acceleration, pdex automatically logs a warning and
falls back to thread-only execution.
Usage
import anndata as ad
import numpy as np
import pandas as pd
from pdex import parallel_differential_expression
PERT_COL = "perturbation"
CONTROL_VAR = "control"
N_CELLS = 1000
N_GENES = 100
N_PERTS = 10
MAX_UMI = 1e6
def build_random_anndata(
n_cells: int = N_CELLS,
n_genes: int = N_GENES,
n_perts: int = N_PERTS,
pert_col: str = PERT_COL,
control_var: str = CONTROL_VAR,
) -> ad.AnnData:
"""Sample a random AnnData object."""
return ad.AnnData(
X=np.random.randint(0, MAX_UMI, size=(n_cells, n_genes)),
obs=pd.DataFrame(
{
pert_col: np.random.choice(
[f"pert_{i}" for i in range(n_perts)] + [control_var],
size=n_cells,
replace=True,
),
}
),
)
def main():
adata = build_random_anndata()
# Run pdex with default metric (wilcoxon)
results = parallel_differential_expression(
adata,
reference=CONTROL_VAR,
groupby_key=PERT_COL,
)
assert results.shape[0] == N_GENES * N_PERTS
# Run pdex with alt metric (anderson)
results = parallel_differential_expression(
adata,
reference=CONTROL_VAR,
groupby_key=PERT_COL,
metric="anderson"
)
assert results.shape[0] == N_GENES * N_PERTS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdex-0.1.28.tar.gz.
File metadata
- Download URL: pdex-0.1.28.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52b0f442eaa73705074f388c2070427b1259e28442addd27e05e645da63e7390
|
|
| MD5 |
13ff7575d4f52a35416784a79db4f72a
|
|
| BLAKE2b-256 |
f4d37a8c31f5fb40b88eb4d3dc21b8f3721f153969a695d554399b9b9a49cd87
|
File details
Details for the file pdex-0.1.28-py3-none-any.whl.
File metadata
- Download URL: pdex-0.1.28-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a57bde53bb8c0b3fcb23d207a7612f633031e25d2d9298a014aa6a0d69ca7405
|
|
| MD5 |
0b2dd10c80dafb28494b5aae3a758db8
|
|
| BLAKE2b-256 |
6bb0f90033e5cc1650fcc5b4fabdc67b14ba59faa70cebb5bd3acef31ce96f0f
|