Metaprogram discovery via non-negative matrix factorization for single-cell RNA-seq data

These details have not been verified by PyPI

Project links

Project description

mpnmf

Metaprogram discovery via non-negative matrix factorization for single-cell RNA-seq data.

Overview

mpnmf is a Python implementation of the metaprogram discovery method described in Gavish et al. (2023, Nature). It identifies recurrent transcriptional programs — "metaprograms" — across samples in single-cell RNA-seq data through three steps:

Per-sample NMF (run) — factorizes each sample's expression matrix across a range of ranks.
Program refinement (refine) — retains programs that are intra-sample reproducible and inter-sample recurrent while removing intra-sample redundancy.
Program clustering (cluster) — iteratively merges filtered programs sharing gene overlap into metaprograms.

Differences from the original method

Deterministic NMF initialization

The original R implementation runs NMF with random initialization and averages results across multiple runs. For each sample, mpnmf fits NMF once per rank using NNDSVDa initialization (Boutsidis & Gallopoulos, 2008), which is deterministic and produces identical output on repeated runs. This yields reproducible outputs without the need for consensus averaging, reducing computational cost substantially.

Preprocessing modes

Before NMF, mpnmf applies gene selection, centering, and optional scaling. Two modes are offered, with defaults matching common analytical conventions:

HEG mode (mode='heg', scale=False by default): top genes by mean expression, centered per gene, clipped to non-negative. Matches the original method.
HVG mode (mode='hvg', scale=True by default): highly variable genes (dispersion-based) filtered further by minimum mean expression, centered and divided by gene standard deviation, clipped to non-negative. Full z-normalization amplifies lowly expressed but strongly variable genes, enabling detection of rare or trace signals.

The default scaling behavior in each mode can be overridden via the scale argument.

Installation

Requirements: Python ≥ 3.9.

We recommend installing mpnmf in a dedicated conda environment to avoid dependency conflicts with other single-cell tools:

conda create -n mpnmf
conda activate mpnmf
pip install mpnmf

Or install into an existing environment:

pip install mpnmf

Usage

import scanpy as sc
import mpnmf

adata = sc.read_h5ad("your_data.h5ad")        # anndata should be log-normalized

sample_key  = "batch"
sample_list = adata.obs[sample_key].unique().tolist()
krange      = range(7, 13)

# NMF run: HVG mode
nmf_run     = mpnmf.run(adata, krange=krange, sample_key=sample_key, sample_list=sample_list, mode="hvg", n_top_genes=7000, scale=True, title="test")
# NMF run: HEG mode
nmf_run     = mpnmf.run(adata, krange=krange, sample_key=sample_key, sample_list=sample_list, mode="heg", n_top_genes=7000, scale=False, title="test")

# Program refinement: intra-sample reproducibility, inter-sample recurrence, intra-sample non-redundancy
nmf_refined = mpnmf.refine(nmf_run, thres_intra=0.7, thres_inter=0.2, thres_redun=0.2, title="test")

# Metaprogram clustering: iteratively merge programs into metaprograms
nmf_df      = mpnmf.cluster(nmf_refined, thres_overlap=0.3, min_overlap=5, title="test")

Input requirements

adata.X is log-normalized expression (not raw counts, not z-scored).
adata.var_names contains unique gene symbols.
adata.obs contains a column identifying the sample of each cell.
Each sample has enough cells to factorize at max(krange) (rule of thumb: ≥ 50).

APIs

`mpnmf.run(adata, krange, sample_key, sample_list, ...)`

Runs NMF per sample across a range of ranks.

Parameter	Default	Description
`adata`	—	Log-normalized AnnData object.
`krange`	—	Iterable of NMF ranks to try (e.g., `range(4, 10)`).
`sample_key`	—	Column in `adata.obs` used to split cells by sample.
`sample_list`	—	List of sample values to run NMF on.
`n_genes`	`50`	Number of top genes retained per program.
`max_iter`	`5000`	Max NMF iterations per fit.
`mode`	`'hvg'`	Gene selection: `'hvg'` (dispersion-based) or `'heg'` (mean expression).
`n_top_genes`	`7000`	Number of genes kept after selection.
`min_exp_pct`	`0.2`	In HVG mode, drop bottom fraction by mean expression.
`scale`	`'auto'`	Whether to divide by gene std after centering. `'auto'` = `True` for HVG, `False` for HEG. Centering is always applied regardless.
`title`	`None`	Prefix for output files; defaults to `"mpnmf"`.
`savepath`	`None`	Output directory; defaults to `./mpnmf/`.

Returns: nmf_run dict, keyed by sample → rank → {W, H, rank}.

`mpnmf.refine(nmf_run, ...)`

Filters programs through three sequential criteria: intra-sample reproducibility, inter-sample recurrence, and intra-sample non-redundancy.

Parameter	Default	Description
`nmf_run`	—	Output of `mpnmf.run`.
`samples`	`None`	Subset of samples to use; defaults to all.
`krange`	`None`	Subset of ranks to use; defaults to all.
`n_genes`	`None`	Program length; inferred from `nmf_run` if not given.
`thres_intra`	`0.7`	Min fraction of top genes shared with another rank in the same sample.
`thres_inter`	`0.2`	Min fraction of top genes shared with the best-matching program in another sample.
`thres_redun`	`0.2`	Max allowed overlap with programs already kept in the same sample.
`title`	`None`	Prefix for output files; defaults to `"mpnmf"`.
`savepath`	`None`	Output directory; defaults to `./mpnmf/`.

Returns: nmf_refined dict, keyed by program name → {genes, scores}.

`mpnmf.cluster(nmf_refined, ...)`

Iteratively merges refined programs into metaprograms by gene overlap.

Parameter	Default	Description
`nmf_refined`	—	Output of `mpnmf.refine`.
`n_genes`	`50`	Expected program length; all programs must match.
`thres_overlap`	`0.3`	Min fraction of shared genes to merge two programs.
`min_overlap`	`5`	Min number of qualifying partners for a program to seed a new metaprogram.
`title`	`None`	Prefix for output files; defaults to `"mpnmf"`.
`savepath`	`None`	Output directory; defaults to `./mpnmf/`.

Returns: nmf_df, a DataFrame of genes × metaprograms.

Output files

Function	File	Content
`run`	`{prefix}_run.pkl`	`nmf_run` dict
`refine`	`{prefix}_refined.pkl`	`nmf_refined` dict
`cluster`	`{prefix}_clustered.pkl`	`MP_dict` (genes + scores + freq per MP)
`cluster`	`{prefix}.csv`	Gene × MP table

{prefix} = title if given, else "mpnmf".

Citation

If you use mpnmf in your research, please cite the original paper:

Gavish, A., Tyler, M., Greenwald, A.C., et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618, 598–606 (2023).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Apr 27, 2026

0.1.1

Apr 27, 2026

This version

0.1.0

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mpnmf-0.1.0.tar.gz (10.2 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mpnmf-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file mpnmf-0.1.0.tar.gz.

File metadata

Download URL: mpnmf-0.1.0.tar.gz
Upload date: Apr 22, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mpnmf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cd86006114ab5b2e1e0d18dbc8fb496e9dd2e4c22b7959f8933622726799e287`
MD5	`576ced135cdd875e47e3b156a65b8c0f`
BLAKE2b-256	`868ea69941c21d9b9ebe27dbc7a1124480437dbe72ca91d8fd85a22afa976923`

See more details on using hashes here.

File details

Details for the file mpnmf-0.1.0-py3-none-any.whl.

File metadata

Download URL: mpnmf-0.1.0-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mpnmf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5e18ef826450e0c624e74c43a1a9fd5e2123e06e20ddc7f0d6e1e9a046ff736`
MD5	`310b71cc81d85c7e8d0d985efee604a5`
BLAKE2b-256	`0c8f315b649f88882103e17942056a3112d89f4cb3fb89692302ce04efb5693e`

See more details on using hashes here.

mpnmf 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mpnmf

Overview

Differences from the original method

Deterministic NMF initialization

Preprocessing modes

Installation

Usage

Input requirements

APIs

`mpnmf.run(adata, krange, sample_key, sample_list, ...)`

`mpnmf.refine(nmf_run, ...)`

`mpnmf.cluster(nmf_refined, ...)`

Output files

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes