Skip to main content

Bernoulli (binary) mean-parameterized NMF (NBMF) w/ Majorization–Minimization (MM)

Project description

NBMF‑MM

CI License Python versions

NBMF‑MM is a fast, scikit‑learn‑style implementation of mean‑parameterized Bernoulli (binary) matrix factorization using a Majorization–Minimization (MM) solver.

  • Two symmetric orientations:
    • orientation="dir-beta" (default, Aspect Bernoulli): columns of H lie on the simplex; Beta prior on W.
    • orientation="beta-dir" (Binary ICA): rows of W lie on the simplex; Beta prior on H.
  • Masked training for matrix completion / hold‑out validation.
  • Optional acceleration: NumExpr (elementwise ops) and Numba (simplex projection).
  • Projection options: default Duchi simplex projection (fast) with an opt‑in legacy “normalize” method for parity with older behavior.

Installation

From PyPI (when released):

pip install nbmf-mm

From source:

pip install "git+https://github.com/siddC/nbmf_mm"

Optional extras:

# scikit-learn integration & NNDSVD-style init (if you enable it later)
pip install "nbmf-mm[sklearn]"

# docs build stack
pip install "nbmf-mm[docs]"

Quick Start

import numpy as np
from nbmf_mm import NBMF

rng = np.random.default_rng(0)
X = (rng.random((100, 500)) < 0.25).astype(float)   # binary {0,1} or probabilities in [0,1]

# Aspect Bernoulli (default): H columns on simplex; W has a Beta prior
model = NBMF(
    n_components=20,
    orientation="dir-beta",
    alpha=1.2, beta=1.2,
    random_state=0,
    max_iter=2000, tol=1e-6,
    # fast defaults:
    projection_method="duchi",      # Euclidean simplex projection (recommended)
    projection_backend="auto",      # prefer Numba if installed
    use_numexpr=True,               # use NumExpr if installed
).fit(X)

W = model.W_                 # shape (n_samples, n_components)
H = model.components_        # shape (n_components, n_features)
Xhat = model.inverse_transform(W)  # probabilities in (0,1)

# Transform new data using fixed components H
X_new = (rng.random((10, 500)) < 0.25).astype(float)
W_new = model.transform(X_new)     # shape (10, n_components)

# Masked training / hold-out validation
mask = (rng.random(X.shape) < 0.9).astype(float)  # observe 90% of entries
model = NBMF(n_components=20).fit(X, mask=mask)

print("score (−NLL per observed entry):", model.score(X, mask=mask))
print("perplexity:", model.perplexity(X, mask=mask))

Why two orientations?

  • dir-beta (Aspect Bernoulli) — H columns are on the simplex → each feature (e.g., gene) has interpretable mixture memberships across latent aspects. W carries sample‑specific propensities with a Beta prior.

  • beta-dir (Binary ICA) — W rows are on the simplex; H is Beta‑constrained.

Both solve the same Bernoulli mean‑parameterized factorization with different geometric constraints; pick the one that best matches your interpretability needs.


API (scikit-learn style)

  • NBMF(...).fit(X, mask=None) -> self
  • fit_transform(X, mask=None) -> W
  • transform(X, mask=None, max_iter=500, tol=1e-6) -> W (estimate W for new X with learned H fixed)
  • inverse_transform(W) -> Xhat (reconstructed probabilities in (0,1))
  • score(X, mask=None) -> float (negative NLL per observed entry; higher is better)
  • perplexity(X, mask=None) -> float (exp of average NLL per observed entry; lower is better)

Key parameters

  • n_components (int) — rank 𝐾
  • orientation ∈ {"dir-beta", "beta-dir"}.
  • alpha, beta (float > 0) — Beta prior hyperparameters (on W for dir-beta, on H for beta-dir).
  • projection_method ∈ {"duchi", "normalize"} — default "duchi" (fast & stable). "normalize" gives legacy behavior (nonnegativity + renormalization).
  • projection_backend ∈ {"auto", "numba", "numpy"} — backend for "duchi" projection.
  • use_numexpr (bool) — use NumExpr if available.

Command-line (CLI)

After installation, a console script nbmf-mm is available:

nbmf-mm fit \
  --input X.npz --rank 30 \
  --orientation dir-beta --alpha 1.2 --beta 1.2 \
  --max-iter 2000 --tol 1e-6 --seed 0 --n-init 1 \
  --mask train_mask.npz \
  --out model_rank30.npz

This writes an .npz with W, H, Xhat, objective_history, and n_iter.

Input formats: .npz (expects key arr_0) or .npy. Masks are optional and must match X shape.

Data requirements

  • X must be in [0,1] (binary recommended; probabilistic inputs are allowed).
  • mask (optional) must be the same shape as X, with values in [0,1] (typically {0,1}). Sparse inputs (scipy.sparse) and masks are accepted and densified internally in this version.

Performance notes

  • The default Duchi projection gives an 𝑂(𝑑*log⁡(𝑑)) O(dlogd) per‑row/column simplex projection and is accelerated with Numba when installed.

NumExpr speeds large elementwise expressions.

Both accelerations are optional and degrade gracefully if not present.

Reproducibility

  • Set random_state (int) for reproducible initialization.
  • Use n_init > 1 to run several random restarts and keep the best NLL.

References

  • Simplex projection (default):
    • J. Duchi, S. Shalev‑Shwartz, Y. Singer, T. Chandra (2008). Efficient Projections onto the ℓ₁‑Ball for Learning in High Dimensions. ICML 2008.

    • W. Wang, M. Á. Carreira‑Perpiñán (2013). Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv:1309.1541.

    • Bayesian NBMF (related, slower but fully Bayesian):

      • See the NBMF project by alumbreras for reference implementations of Bayesian variants.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbmf_mm-0.1.1.tar.gz (63.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nbmf_mm-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file nbmf_mm-0.1.1.tar.gz.

File metadata

  • Download URL: nbmf_mm-0.1.1.tar.gz
  • Upload date:
  • Size: 63.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nbmf_mm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 958a9d045f3829bdc9d3f55486608d05a37efe3874519031353a76d1946e9508
MD5 d22bdc36a3f4e8d36c3447bcffb93490
BLAKE2b-256 109529c38199fc9cf68f62eefaa007b9b19a3a5497abcf13fa20425e05cf1789

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbmf_mm-0.1.1.tar.gz:

Publisher: release.yml on siddC/nbmf_mm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbmf_mm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: nbmf_mm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nbmf_mm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 873ce8107ffee0ffeb522b0242ea74a2606972cafdc6b1c5760f704f7bae19c4
MD5 250c628120bebfffddd28315cb7453ef
BLAKE2b-256 70e668017f9131a658eff364b1d83368eb4d689641951f01bbf8c33395a586ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbmf_mm-0.1.1-py3-none-any.whl:

Publisher: release.yml on siddC/nbmf_mm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page