Skip to main content

Implementation of Topic-Supervised Non-Negative Matrix Factorization

Project description

tsnmf-sparse

This repository contains an implementation of Topic-Supervised Non-Negative Matrix Factorization (TS-NMF) [1] with Sparse Matrices in Python, using a Scikit-Learn's compatible API.

How it Works

From [1]: Suppose that one supervises k << n documents and identifies l << t topics that were contained in a subset of the documents. One can supervise the NMF method using this information, represented by an n×d topic supervision matrix L.The elements of L contrain the importance weights of matrix W and are of the following form:

Then, for a term-document matrix V and supervision matrix L, TS-NMF seeks matrices W and H that minimize

Where â—‹ represent the Hadamard (element-wise) product operator.

Installation

You can install TS-NMF via pip:

pip install tsnmf

Or clonning this repository and running setup.py:

python setup.py install

Usage

TS-NMF is used in a similar way as the module decomposition.NMF from Scikit-Learn. The extra thing that you need is a list of list that contains the labels to build the matrix L.

Suppose you want to get 3 topics from 5 documents. The 5 documents should be represented in a matrix V, the most used way is apply a TF-IDF Vectorizer, which reflect how important a word is to a document.

Each element of the list of list of labels correspond to a document. These elements contain a list of topics that contrain the document. For example

labels = [[],
          [0,2], # document 1
          [],
          [],
          [1]] # document 4

means that the document 1 is contrained to be topic 0 or 2 and document 4 to be topic 1. For the other documents all the topics are permitted.

Finally, to run TS-NMF:

from tsnmf import TSNMF

tsnmf = TSNMF(n_components=3, random_state=1)
W = tsnmf.fit_transform(V, labels=labels)
H = tsnmf.components_

Credits

  • Developed mainly by Victor Navarro (@vokturz), under the guidance of Eduardo Graells-Garrido (@carnby), in the context of CONICYT Fondo de Fomento al Desarrollo Científico y Tecnológico (FONDECYT) Proyecto de Iniciación 11180913.
  • Based on scikit-learn's NMF code and the original ws-nmf.

References

  1. MacMillan, Kelsey, and James D. Wilson. "Topic supervised non-negative matrix factorization." arXiv preprint arXiv:1706.05084 (2017).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsnmf-1.0.4.tar.gz (8.1 kB view hashes)

Uploaded Source

Built Distribution

tsnmf-1.0.4-py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page