Skip to main content

Parallel differential expression for single-cell perturbation sequencing

Project description

pdex

Parallel differential expression for single-cell perturbation sequencing.

Installation

# add to pyproject.toml
uv add pdex

# add to env
uv pip install pdex

Overview

pdex computes per-gene differential expression statistics between perturbation groups in single-cell data using Mann-Whitney U tests with FDR correction. It was originally designed for CRISPR screen and perturbation sequencing datasets with many groups and large cell counts.

It supports dense and sparse (CSR) expression matrices, and uses numba-mwu for Numba-accelerated Mann-Whitney U computation.

Modes

Mode Description
"ref" Each group vs a single reference group (default: "non-targeting")
"all" Each group vs all remaining cells (1-vs-rest)
"on_target" Each group vs the reference at its single target gene only

Usage

Reference mode (default)

import anndata as ad
from pdex import pdex

adata = ad.read_h5ad("screen.h5ad")

results = pdex(
    adata,
    groupby="guide",
    mode="ref",
    is_log1p=False,
)

1-vs-rest mode

results = pdex(
    adata,
    groupby="guide",
    mode="all",
    is_log1p=False,
)

On-target mode

Requires a column in adata.obs mapping each group to its target gene:

results = pdex(
    adata,
    groupby="guide",
    mode="on_target",
    gene_col="target_gene",
    is_log1p=False,
)

Parameters

Parameter Type Default Description
adata AnnData required Annotated data matrix (dense or sparse CSR)
groupby str required Column in adata.obs defining groups
mode str "ref" Comparison mode: "ref", "all", or "on_target"
threads int 0 Numba thread count (0 = all CPUs)
is_log1p bool | None None Whether data is log1p-transformed. Auto-detected if None
geometric_mean bool True Use geometric mean for pseudobulk (vs arithmetic)
as_pandas bool False Return a pandas DataFrame instead of Polars
epsilon float 1e-9 Pseudocount used for log2_fold_change and percent_change; pass 0.0 for legacy one-sided ±inf values
cpm_filter float | None None Optional pooled-CPM floor filter; drops rows where both target and reference CPM are at or below the threshold
reference str "non-targeting" Reference group name (modes: ref, on_target)
gene_col str Column mapping groups to target genes (mode: on_target)

CPM filter

cpm_filter is an opt-in floor filter. When set to a threshold T, a (target, feature) row is dropped only when the gene's pooled CPM is <= T in both the target group and the reference group. Rows are kept when either side has CPM > T.

The CPM view is used only for filtering: reported means, fold changes, MWU statistics, and p-values are still computed from the original expression scale. When is_log1p=True, counts are recovered with expm1 before CPM is computed. FDR is corrected over the surviving genes only.

Output

Returns a Polars DataFrame (or pandas if as_pandas=True) with one row per (group, gene) pair:

Column Description
target Perturbation group name
feature Gene name
target_mean Pseudobulk mean for the target group (count space)
ref_mean Pseudobulk mean for the reference (count space)
target_membership Number of cells in the target group
ref_membership Number of cells in the reference
fold_change Deprecated alias for log2_fold_change (identical values). Will be removed in pdex 0.3.0.
log2_fold_change log2((target_mean + epsilon) / (ref_mean + epsilon)). With default epsilon=1e-9, one-sided zeros are large finite values; with epsilon=0.0, one-sided zeros yield ±inf. Genes unexpressed in both groups (0/0) report 0.0, not NaN.
percent_change (target_mean - ref_mean) / (ref_mean + epsilon). With default epsilon=1e-9, zero-reference cases are finite; with epsilon=0.0, a zero reference with nonzero target yields +inf. Genes unexpressed in both groups (0/0) report 0.0, not NaN.
p_value Mann-Whitney U p-value
statistic Mann-Whitney U statistic
fdr FDR-corrected p-value (per-group, across genes). For on_target mode, this is applied across all groups. When cpm_filter is set, FDR is corrected over surviving genes only.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdex-0.2.5.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdex-0.2.5-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file pdex-0.2.5.tar.gz.

File metadata

  • Download URL: pdex-0.2.5.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pdex-0.2.5.tar.gz
Algorithm Hash digest
SHA256 a570fc33aa2a3616e4277dad5556cb37cb9a6c933a594bd8570d530116bdfe55
MD5 b410b9253cbca2fdf5a7756bf005bdfe
BLAKE2b-256 9605b23a211161d26d698921849050a5cd9008161fe7c03ba19f05b266064936

See more details on using hashes here.

File details

Details for the file pdex-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: pdex-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pdex-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 da9e8a2d6884222f09dfac70ef1e2a77c185f8e5799b2593341911f140a97337
MD5 a4a464466f602a4ccf0e5f911610d210
BLAKE2b-256 2ce603a3a1f66d5bfa7874c45939b58f2b91dd4a73529451232c31170599fb3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page