Skip to main content

SSNMF contains class for (SS)NMF model and several multiplicative update methods to train different models.

Project description

SSNMF

PyPI Version Supported Python Versions

SSNMF contains class for (SS)NMF model and several multiplicative update methods to train different models.


Installation

To install SSNMF, run this command in your terminal:

    $ pip install -U ssnmf

This is the preferred method to install SSNMF, as it will always install the most recent stable release.

If you don't have pip installed, these installation instructions can guide you through the process.

Usage

First, import the ssnmf package and the relevant class SSNMF. We import numpy and `scipy' for experimentation.

>>> import ssnmf
>>> from ssnmf import SSNMF
>>> import numpy as np
>>> import scipy
>>> import scipy.sparse as sparse
>>> import scipy.optimize

Training an unsupervised model

Declare an unsupervised NMF model with data matrix X and number of topics k.

>>> X = np.random.rand(100,100)
>>> k = 10
>>> model = SSNMF(X,k)

You may access the factor matrices initialized in the model, e.g., to check relative reconstruction error ||X-AS||_F/||X||_F.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')

Run the multiplicative updates method for this unsupervised model for N iterations. This method tries to minimize the objective function ||X-AS||_F.

>>> N = 100
>>> model.mult(numiters = N)

This method updates the factor matrices N times. You can see how much the relative reconstruction error improves.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')

Training a supervised model

We begin by generating some synthetic data for testing.

>>> labelmat = np.concatenate((np.concatenate((np.ones([1,10]),np.zeros([1,30])),axis=1),np.concatenate((np.zeros([1,10]),np.ones([1,10]),np.zeros([1,20])),axis=1),np.concatenate((np.zeros([1,20]),np.ones([1,10]),np.zeros([1,10])),axis=1),np.concatenate((np.zeros([1,30]),np.ones([1,10])),axis=1)))
>>> B = sparse.random(4,10,density=0.2).toarray()
>>> S = np.zeros([10,40])
>>> for i in range(40):
	S[:,i] = scipy.optimize.nnls(B,labelmat[:,i])[0]
>>> A = np.random.rand(40,10)
>>> X = A @ S

Declare a supervised NMF model with data matrix X, number of topics k, label matrix Y, and weight parameter lam.

>>> k = 10
>>> model = SSNMF(X,k,Y = labelmat,lam=100*np.linalg.norm(X,'fro'))

You may access the factor matrices initialized in the model, e.g., to check relative reconstruction error ||X-AS||_F/||X||_F and classification accuracy.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')
>>> acc = model.accuracy()

Run the multiplicative updates method for this supervised model for N iterations. This method tries to minimize the objective function ||X-AS||_F^2 + lam ||Y - BS||_F^2. This also saves the errors and accuracies in each iteration.

>>> N = 100
>>> [errs,reconerrs,classerrs,classaccs] = model.snmfmult(numiters = N,saveerrs = True)

This method updates the factor matrices N times. You can see how much the relative reconstruction error and classification accuracy improves.

>>> rel_error = reconerrs[99]/np.linalg.norm(X,'fro')
>>> acc = classaccs[99]

Training a supervised model with KL-divergence

We begin by generating some synthetic data for testing.

>>> labelmat = np.concatenate((np.concatenate((np.ones([1,10]),np.zeros([1,30])),axis=1),np.concatenate((np.zeros([1,10]),np.ones([1,10]),np.zeros([1,20])),axis=1),np.concatenate((np.zeros([1,20]),np.ones([1,10]),np.zeros([1,10])),axis=1),np.concatenate((np.zeros([1,30]),np.ones([1,10])),axis=1)))
>>> B = sparse.random(4,10,density=0.2).toarray()
>>> S = np.zeros([10,40])
>>> for i in range(40):
	S[:,i] = scipy.optimize.nnls(B,labelmat[:,i])[0]
>>> A = np.random.rand(40,10)
>>> X = A @ S

Declare a supervised NMF model with data matrix X, number of topics k, label matrix Y, and weight parameter lam.

>>> k = 10
>>> model = SSNMF(X,k,Y = labelmat,lam=100*np.linalg.norm(X,'fro'))

You may access the factor matrices initialized in the model, e.g., to check relative reconstruction error ||X-AS||_F/||X||_F, classification accuracy, and KL-divergence improves.

>>> rel_error = np.linalg.norm(model.X - model.A @ model.S, 'fro')/np.linalg.norm(model.X,'fro')
>>> acc = model.accuracy()
>>> div = model.kldiv()

Run the multiplicative updates method for this supervised model for N iterations. This method tries to minimize the objective function ||X-AS||_F^2 + lam D(Y||BS). This also saves the errors and accuracies in each iteration.

>>> N = 100
>>> [errs,reconerrs,classerrs,classaccs] = model.klsnmfmult(numiters = N,saveerrs = True)

This method updates the factor matrices N times. You can see how much the relative reconstruction error and classification accuracy improves.

>>> rel_error = reconerrs[99]/np.linalg.norm(X,'fro')
>>> acc = classaccs[99]
>>> div = classerrs[99]

Citing

If you use our code in an academic setting, please consider citing our code.

Development

See CONTRIBUTING.md for information related to developing the code.

Suggested Git Branch Strategy

  1. master is for the most up-to-date development, very rarely should you directly commit to this branch. Your day-to-day work should exist on branches separate from master. It is recommended to commit to development branches and make pull requests to master.4. It is recommended to use "Squash and Merge" commits when committing PR's. It makes each set of changes to master atomic and as a side effect naturally encourages small well defined PR's.

Additional Optional Setup Steps:

  • Create an initial release to test.PyPI and PyPI.

    • Follow This PyPA tutorial, starting from the "Generating distribution archives" section.
  • Create a blank github repository (without a README or .gitignore) and push the code to it.

  • Delete these setup instructions from README.md when you are finished with them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssnmf-0.0.1.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

ssnmf-0.0.1-py2.py3-none-any.whl (7.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page