Bernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization–Minimization (MM)
Project description
NBMF‑MM
NBMF‑MM is a fast, scikit‑learn‑style implementation of mean‑parameterized Bernoulli (binary) matrix factorization using a Majorization–Minimization (MM) solver.
- Two symmetric orientations:
orientation="dir-beta"(default, Aspect Bernoulli): columns ofHlie on the simplex; Beta prior onW.orientation="beta-dir"(Binary ICA): rows ofWlie on the simplex; Beta prior onH.
- Masked training for matrix completion / hold‑out validation.
- Optional acceleration: NumExpr (elementwise ops) and Numba (simplex projection).
- Projection options: default Duchi simplex projection (fast) with an opt‑in legacy “normalize” method for parity with older behavior.
Installation
From PyPI (when released):
pip install nbmf-mm
From source:
pip install "git+https://github.com/siddC/nbmf_mm"
Optional extras:
# scikit-learn integration & NNDSVD-style init (if you enable it later)
pip install "nbmf-mm[sklearn]"
# docs build stack
pip install "nbmf-mm[docs]"
Quick Start
import numpy as np
from nbmf_mm import NBMF
rng = np.random.default_rng(0)
X = (rng.random((100, 500)) < 0.25).astype(float) # binary {0,1} or probabilities in [0,1]
# Aspect Bernoulli (default): H columns on simplex; W has a Beta prior
model = NBMF(
n_components=20,
orientation="dir-beta",
alpha=1.2, beta=1.2,
random_state=0,
max_iter=2000, tol=1e-6,
# fast defaults:
projection_method="duchi", # Euclidean simplex projection (recommended)
projection_backend="auto", # prefer Numba if installed
use_numexpr=True, # use NumExpr if installed
).fit(X)
W = model.W_ # shape (n_samples, n_components)
H = model.components_ # shape (n_components, n_features)
Xhat = model.inverse_transform(W) # probabilities in (0,1)
# Transform new data using fixed components H
X_new = (rng.random((10, 500)) < 0.25).astype(float)
W_new = model.transform(X_new) # shape (10, n_components)
# Masked training / hold-out validation
mask = (rng.random(X.shape) < 0.9).astype(float) # observe 90% of entries
model = NBMF(n_components=20).fit(X, mask=mask)
print("score (−NLL per observed entry):", model.score(X, mask=mask))
print("perplexity:", model.perplexity(X, mask=mask))
Why two orientations?
-
dir-beta (Aspect Bernoulli) — H columns are on the simplex → each feature (e.g., gene) has interpretable mixture memberships across latent aspects. W carries sample‑specific propensities with a Beta prior.
-
beta-dir (Binary ICA) — W rows are on the simplex; H is Beta‑constrained.
Both solve the same Bernoulli mean‑parameterized factorization with different geometric constraints; pick the one that best matches your interpretability needs.
API (scikit-learn style)
NBMF(...).fit(X, mask=None) -> selffit_transform(X, mask=None) -> Wtransform(X, mask=None, max_iter=500, tol=1e-6) -> W(estimateWfor newXwith learnedHfixed)inverse_transform(W) -> Xhat(reconstructed probabilities in (0,1))score(X, mask=None) -> float(negative NLL per observed entry; higher is better)perplexity(X, mask=None) -> float(exp of average NLL per observed entry; lower is better)
Key parameters
n_components(int) — rank 𝐾orientation∈ {"dir-beta","beta-dir"}.alpha, beta(float > 0) — Beta prior hyperparameters (onWfor dir-beta, onHforbeta-dir).projection_method∈ {"duchi","normalize"} — default"duchi"(fast & stable)."normalize"gives legacy behavior (nonnegativity + renormalization).projection_backend∈ {"auto","numba","numpy"} — backend for"duchi"projection.use_numexpr(bool) — use NumExpr if available.
Command-line (CLI)
After installation, a console script nbmf-mm is available:
nbmf-mm fit \
--input X.npz --rank 30 \
--orientation dir-beta --alpha 1.2 --beta 1.2 \
--max-iter 2000 --tol 1e-6 --seed 0 --n-init 1 \
--mask train_mask.npz \
--out model_rank30.npz
This writes an .npz with W, H, Xhat, objective_history, and n_iter.
Input formats: .npz (expects key arr_0) or .npy. Masks are optional and must match X shape.
Data requirements
Xmust be in [0,1] (binary recommended; probabilistic inputs are allowed).mask(optional) must be the same shape as X, with values in [0,1] (typically {0,1}). Sparse inputs (scipy.sparse) and masks are accepted and densified internally in this version.
Performance notes
- The default Duchi projection gives an 𝑂(𝑑*log(𝑑)) O(dlogd) per‑row/column simplex projection and is accelerated with Numba when installed.
NumExpr speeds large elementwise expressions.
Both accelerations are optional and degrade gracefully if not present.
Reproducibility
- Set
random_state(int) for reproducible initialization. - Use
n_init > 1to run several random restarts and keep the best NLL.
References
- Simplex projection (default):
-
J. Duchi, S. Shalev‑Shwartz, Y. Singer, T. Chandra (2008). Efficient Projections onto the ℓ₁‑Ball for Learning in High Dimensions. ICML 2008.
-
W. Wang, M. Á. Carreira‑Perpiñán (2013). Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.
-
Bayesian NBMF (related, slower but fully Bayesian):
- See the
NBMFproject by alumbreras for reference implementations of Bayesian variants.
- See the
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nbmf_mm-0.1.1.tar.gz.
File metadata
- Download URL: nbmf_mm-0.1.1.tar.gz
- Upload date:
- Size: 63.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
958a9d045f3829bdc9d3f55486608d05a37efe3874519031353a76d1946e9508
|
|
| MD5 |
d22bdc36a3f4e8d36c3447bcffb93490
|
|
| BLAKE2b-256 |
109529c38199fc9cf68f62eefaa007b9b19a3a5497abcf13fa20425e05cf1789
|
Provenance
The following attestation bundles were made for nbmf_mm-0.1.1.tar.gz:
Publisher:
release.yml on siddC/nbmf_mm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nbmf_mm-0.1.1.tar.gz -
Subject digest:
958a9d045f3829bdc9d3f55486608d05a37efe3874519031353a76d1946e9508 - Sigstore transparency entry: 404938795
- Sigstore integration time:
-
Permalink:
siddC/nbmf_mm@e0698dc25cb935d2200d5d8026d98523491ee185 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/siddC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e0698dc25cb935d2200d5d8026d98523491ee185 -
Trigger Event:
push
-
Statement type:
File details
Details for the file nbmf_mm-0.1.1-py3-none-any.whl.
File metadata
- Download URL: nbmf_mm-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873ce8107ffee0ffeb522b0242ea74a2606972cafdc6b1c5760f704f7bae19c4
|
|
| MD5 |
250c628120bebfffddd28315cb7453ef
|
|
| BLAKE2b-256 |
70e668017f9131a658eff364b1d83368eb4d689641951f01bbf8c33395a586ec
|
Provenance
The following attestation bundles were made for nbmf_mm-0.1.1-py3-none-any.whl:
Publisher:
release.yml on siddC/nbmf_mm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nbmf_mm-0.1.1-py3-none-any.whl -
Subject digest:
873ce8107ffee0ffeb522b0242ea74a2606972cafdc6b1c5760f704f7bae19c4 - Sigstore transparency entry: 404938797
- Sigstore integration time:
-
Permalink:
siddC/nbmf_mm@e0698dc25cb935d2200d5d8026d98523491ee185 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/siddC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e0698dc25cb935d2200d5d8026d98523491ee185 -
Trigger Event:
push
-
Statement type: