Rust port of R NMF package — bit-equivalent multiplicative updates (brunet/lee/offset/nsNMF), parallelised with rayon.
Project description
rust-NMF
A modern Rust port of R's NMF package plus
2019-2024 SOTA algorithms (HALS, E-HALS, RcppML-style diagonalised NMF). Six
algorithms are bit-equivalent to R within f64 round-off; four are modern
single-cell-tuned methods. Parallelised with rayon + matrixmultiply, on
omicverse pbmc8k it's 76-280× faster than R depending on configuration.
Quick decision table
| Use case | Recommended | Why |
|---|---|---|
| Single-cell exploratory (default) | lee + NNDSVD, 25 iter |
ARI 0.89 vs cell types — best biology and bit-eq R |
| Pure speed with strong biology | dnmf + NNDSVD, 25 iter |
ARI 0.85, ~26% faster than lee+NNDSVD |
Reproduce existing R NMF::nmf() analysis |
lee, brunet, offset, nsNMF, snmf/r, snmf/l |
bit-eq R within 1e-12 |
| Largest speed (loss only matters) | hals + NNDSVD, 25 iter |
~280× R, ~3× sklearn cd; ARI lower (0.55) |
| Synthetic / ground-truth recovery | lee or brunet, 100+ iter |
lowest Amari error |
| Sparse factors with bit-eq R | snmf/r or snmf/l |
FCNNLS port + L1² regularisation |
| Missing-value imputation | ls-nmf with weight=mask |
weight=0 drops entries from loss |
| Large atlas (>500k cells) | lee + NNDSVD or dnmf + NNDSVD |
NNDSVD warm-start dominant at scale |
Key empirical finding (from examples/benchmark_scaling.ipynb init-pairing section,
real pbmc8k 3000 cells × 1500 HVG, rank 8): NNDSVD init is the magic, not the algorithm.
With NNDSVD seeding, every algorithm reaches its biological-signal ceiling in ~25
iterations. Random init takes 100 iters and still ends ~30% worse on ARI. Pair NNDSVD
with lee/brunet/dnmf and you get the best of both worlds (R-equivalence + best
biology, or speed + best biology).
Four one-liner recipes
import numpy as np
from nmf_rs import nmf, nndsvd_init
# 1. EXPLORATORY single-cell, best ARI vs cell types (0.89), still bit-eq R
W0, H0 = nndsvd_init(V, rank=10, fill='mean')
res = nmf(V, rank=10, method='lee', # 'brunet' also reaches ARI 0.86
W0=W0, H0=H0, max_iter=25, num_threads=16)
# 2. SPEED + biology, modern algorithm (ARI 0.85, 26% faster than recipe 1)
W0, H0 = nndsvd_init(V, rank=10, fill='mean')
res = nmf(V, rank=10, method='dnmf',
W0=W0, H0=H0, max_iter=25, num_threads=16)
# 3. REPRODUCE R analysis exactly — bit-eq, max|ΔW| < 1e-12
res = nmf(V, rank=10, method='brunet', # or 'lee', 'offset', 'nsNMF', 'snmf/r', 'snmf/l'
W0=W0_from_R, H0=H0_from_R, max_iter=200, num_threads=16)
# 4. PUREST SPEED at atlas scale (>500k cells), ARI tradeoff
W0, H0 = nndsvd_init(V, rank=20, fill='mean')
res = nmf(V, rank=20, method='hals',
W0=W0, H0=H0, max_iter=25, num_threads=16)
Algorithm catalogue
method |
Origin | Year | Update rule | Bit-eq R | Best for |
|---|---|---|---|---|---|
brunet (alias KL) |
Brunet et al. PNAS | 2004 | KL divergence multiplicative | ✅ 1e-12 | R reproduction; count-like data |
lee (alias Frobenius) |
Lee & Seung | 2001 | Frobenius multiplicative | ✅ 1e-12 | R reproduction; ground-truth recovery |
offset |
Badea | 2008 | Lee + per-feature baseline | ✅ 1e-12 | per-gene background subtraction |
nsNMF |
Pascual-Montano | 2006 | Brunet + smoothing S | ✅ 1e-12 | sparser, more interpretable factors |
ls-nmf (alias lsnmf) |
Wang, Kossenkov, Ochs | 2006 | Weighted Frobenius; needs weight= |
≈ (BLAS variance) | missing-value imputation |
snmf/r |
Kim & Park | 2007 | FCNNLS-based ANLS; sparse H | ✅ 1e-12 | sparse coefficients (cells) |
snmf/l |
Kim & Park | 2007 | FCNNLS-based ANLS; sparse W | ✅ 1e-12 | sparse basis (genes) |
hals |
Cichocki & Phan | 2009 | Block-coord least-squares | ❌ (custom) | speed |
ehals |
Andersen-Ang & Gillis | 2019 | HALS + Nesterov extrapolation | ❌ | speed + slight per-iter accuracy |
dnmf (alias rcppml) |
DeBruine (RcppML) | 2024 | Diagonalised W·diag(d)·H + L1 | ❌ | single-cell biological signal |
The six "✅ 1e-12" rows are bitwise-identical to R NMF within f64 round-off
on every parity test (pytest tests/).
Speed (omicverse pbmc8k, HVG=2000, cells=7750, rank=10)
| config | iters | 16t (s) | speed-up vs R lee (125 s) |
|---|---|---|---|
R lee + R seed |
100 | 125 s | 1× (baseline) |
lee + matched seed |
100 | 1.65 s | 76× ✅ bit-eq |
brunet + matched seed |
100 | 4.30 s | 18× ✅ bit-eq |
hals + random |
100 | 1.61 s | 78× |
dnmf + NNDSVD |
25 | ~0.5 s | ~250× |
hals + NNDSVD |
25 | 0.58 s | 215× |
hals + NNDSVD (loose) |
10 | 0.45 s | 280× |
| sklearn cd (BLAS, 17t) | 100 | 0.79 s | 158× |
See examples/benchmark_pbmc8k.ipynb and examples/comparison_vs_cnmf.ipynb.
Objective accuracy (anchor-free, no algorithm bias)
The standard NMF evaluation metrics from Brunet 2004 and Kim & Park 2007 on real pbmc8k (3000 cells × 1500 HVG, rank 8, K=5 random inits per algorithm):
ARI vs cell types (real pbmc8k, 3k cells × 1.5k HVG, rank 8)
Pivoted by (algorithm × init) to expose the NNDSVD effect:
| algo | NNDSVD (25 iter, K=1) | random (K=5 runs) |
|---|---|---|
lee |
🥇 0.889 | 0.568 |
brunet |
0.864 | 0.621 |
dnmf |
🥈 0.854 | 0.726 |
hals |
0.547 | 0.428 |
ehals |
0.486 | 0.539 |
Wall-clock at the same configuration (16 threads)
| algo | NNDSVD@25 (s) | random×5 (mean s/run) |
|---|---|---|
lee |
0.19 | 0.39 |
brunet |
0.41 | 0.96 |
dnmf |
0.14 | 0.24 |
hals |
0.13 | 0.26 |
ehals |
0.12 | 0.26 |
Three takeaways
-
NNDSVD init dominates the choice of algorithm. Every algorithm gets higher ARI with NNDSVD than with random — typically 30-60% higher. So the question "which algorithm" is downstream of "use NNDSVD".
-
lee + NNDSVDis the gold standard for biological signal. On real single-cell data withpredicted_celltypelabels,lee + NNDSVD@25reaches ARI = 0.89 — the best across all (algorithm × init) combinations. Andleeis bit-equivalent to RNMF::nmf(method="lee")so you can reproduce R analyses exactly. -
dnmf + NNDSVDis the fast alternative. Same NNDSVD trick on the modern (DeBruine 2024) diagonalised algorithm: 0.14 s, ARI 0.85. Trades 0.04 ARI for 26% speed.
Caveat: on synthetic data without label structure, the same recipes
undertrain in 25 iters — for ground-truth recovery prefer lee/brunet
with 100+ iters (medium tier in examples/benchmark_scaling.ipynb).
Tutorial roadmap
| Notebook | Audience | What it covers |
|---|---|---|
tutorial_quickstart.ipynb |
New users | API basics, all 4 R-bit-eq algorithms, stop criteria |
benchmark_vs_R.ipynb |
R NMF users |
Drop-in parity check vs R, max|ΔW|<1e-12 |
benchmark_pbmc8k.ipynb |
Single-cell users | Real pbmc8k thread sweep, factor inspection |
hals_nndsvd_demo.ipynb |
Speed-focused users | HALS + NNDSVD convergence trace, factor quality |
comparison_vs_cnmf.ipynb |
cNMF users | sklearn / cNMF apples-to-apples, factor correlation |
benchmark_scaling.ipynb |
Anyone choosing an algorithm | 3 scales × 7 algos × 6 objective metrics + init pairing |
tutorial_full.ipynb |
Comprehensive | R baseline → all algorithms → init strategies → threading → production recipe |
Suggested reading order for new users:
- Start with
tutorial_quickstart.ipynbfor API. - Then
benchmark_scaling.ipynbto pick the right algorithm for your scale + use case. - If reproducing an R analysis,
benchmark_vs_R.ipynb. - If you've used cNMF before,
comparison_vs_cnmf.ipynb. - For deep-dive:
tutorial_full.ipynb.
Install
# Sherlock or any host with rust toolchain:
module load rust/1.90.0
maturin develop --release --manifest-path rust/Cargo.toml
# Or via pip into an active venv (release wheels coming to PyPI as `nmf-rs`):
VIRTUAL_ENV=/path/to/venv pip install -e .
Reproducing an R NMF analysis
# 1. Generate W0, H0 in R with set.seed(s); export as TSV.
# 2. Load same V, W0, H0 in Python:
import numpy as np
from nmf_rs import nmf
V = np.loadtxt("V.tsv")
W0 = np.loadtxt("W0.tsv")
H0 = np.loadtxt("H0.tsv")
res = nmf(V, rank=W0.shape[1], W0=W0, H0=H0, method="brunet", max_iter=200)
W_R = np.loadtxt("W_from_R.tsv")
H_R = np.loadtxt("H_from_R.tsv")
np.testing.assert_allclose(res.W, W_R, atol=1e-12) # passes exactly
np.testing.assert_allclose(res.H, H_R, atol=1e-12)
See tests/test_parity.py and tests/test_snmf_parity.py for the
auto-regenerated reference fixtures + R-parity asserts.
License
GPL ≥ 2 (inherited from R NMF).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nmf_rs-0.1.0.tar.gz.
File metadata
- Download URL: nmf_rs-0.1.0.tar.gz
- Upload date:
- Size: 36.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
008cc4b8032afc0f88668e4ff2701244aed0bce3cf3b472cf2045bb5c9bccf61
|
|
| MD5 |
84b028cf51b84596786a5deb5ba8654b
|
|
| BLAKE2b-256 |
53d749c1f3516245c06da14895667ae549f080f697f7150e916b8a1a45f92f78
|
Provenance
The following attestation bundles were made for nmf_rs-0.1.0.tar.gz:
Publisher:
release.yml on omicverse/rust-NMF
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nmf_rs-0.1.0.tar.gz -
Subject digest:
008cc4b8032afc0f88668e4ff2701244aed0bce3cf3b472cf2045bb5c9bccf61 - Sigstore transparency entry: 1470670068
- Sigstore integration time:
-
Permalink:
omicverse/rust-NMF@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Trigger Event:
push
-
Statement type:
File details
Details for the file nmf_rs-0.1.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: nmf_rs-0.1.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 347.9 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa1db26e64c06a760ba835bef194e20a484171562944c9a3523df9121b5be0da
|
|
| MD5 |
bea10f734e9e7577b98c0959b6dda70d
|
|
| BLAKE2b-256 |
992f32e061603cd422cc480b7efefd5f4d6983d624a3d2a4096c058dd7af302e
|
Provenance
The following attestation bundles were made for nmf_rs-0.1.0-cp39-abi3-win_amd64.whl:
Publisher:
release.yml on omicverse/rust-NMF
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nmf_rs-0.1.0-cp39-abi3-win_amd64.whl -
Subject digest:
aa1db26e64c06a760ba835bef194e20a484171562944c9a3523df9121b5be0da - Sigstore transparency entry: 1470670220
- Sigstore integration time:
-
Permalink:
omicverse/rust-NMF@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Trigger Event:
push
-
Statement type:
File details
Details for the file nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 477.2 kB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93608cb2f4ab78c9ece2fcdc2b7feb14536f913db01e8f8cb4f186a970c0eed3
|
|
| MD5 |
b380244dbf9dd2aa7273ee5814fe8e43
|
|
| BLAKE2b-256 |
c7a60d1aef67e0ba64eb908f3a8efcf76f2ef877028468837ef54b78fa900870
|
Provenance
The following attestation bundles were made for nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on omicverse/rust-NMF
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
93608cb2f4ab78c9ece2fcdc2b7feb14536f913db01e8f8cb4f186a970c0eed3 - Sigstore transparency entry: 1470670365
- Sigstore integration time:
-
Permalink:
omicverse/rust-NMF@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Trigger Event:
push
-
Statement type:
File details
Details for the file nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 443.5 kB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ad81407bbc9335219f31a0b6746264761733494c6b16697e060729fa749dd17
|
|
| MD5 |
da52895b8a4cc1adcff712a653cd9ddb
|
|
| BLAKE2b-256 |
333cf47bb2fa049de0887f0f36c6acc62bcbf10c1f1da9d3f3de2736f69fbbe8
|
Provenance
The following attestation bundles were made for nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl:
Publisher:
release.yml on omicverse/rust-NMF
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nmf_rs-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl -
Subject digest:
9ad81407bbc9335219f31a0b6746264761733494c6b16697e060729fa749dd17 - Sigstore transparency entry: 1470670611
- Sigstore integration time:
-
Permalink:
omicverse/rust-NMF@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Trigger Event:
push
-
Statement type:
File details
Details for the file nmf_rs-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: nmf_rs-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 823.6 kB
- Tags: CPython 3.9+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92a60270238ed3eb0e61266c40fbbc86d7526417cde06425c1c8d1f2b1cc86ac
|
|
| MD5 |
31cf3c103d317bb5017c825b812422ff
|
|
| BLAKE2b-256 |
208093bded0b8e1db491a1c4e3cdde3098ac289d7bc25aa9f01cfb6f76034bb7
|
Provenance
The following attestation bundles were made for nmf_rs-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
release.yml on omicverse/rust-NMF
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nmf_rs-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
92a60270238ed3eb0e61266c40fbbc86d7526417cde06425c1c8d1f2b1cc86ac - Sigstore transparency entry: 1470670499
- Sigstore integration time:
-
Permalink:
omicverse/rust-NMF@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@df4ddf986807fa6864b367fbcf1717cbb8f95257 -
Trigger Event:
push
-
Statement type: