Skip to main content

Implementation of Topic-Supervised Non-Negative Matrix Factorization

Project description

tsnmf-sparse

This repository contains an implementation of Topic-Supervised Non-Negative Matrix Factorization (TS-NMF) [1] with Sparse Matrices in Python, using a Scikit-Learn's compatible API.

How it Works

From [1]: Suppose that one supervises k << n documents and identifies l << t topics that were contained in a subset of the documents. One can supervise the NMF method using this information, represented by an n×d topic supervision matrix L.The elements of L contrain the importance weights of matrix W and are of the following form:

Then, for a term-document matrix V and supervision matrix L, TS-NMF seeks matrices W and H that minimize

Where â—‹ represent the Hadamard (element-wise) product operator.

Installation

You can install TS-NMF via pip:

pip install tsnmf

Or clonning this repository and running setup.py:

python setup.py install

Usage

TS-NMF is used in a similar way as the module decomposition.NMF from Scikit-Learn. The extra thing that you need is a list of list that contains the labels to build the matrix L.

Suppose you want to get 3 topics from 5 documents. The 5 documents should be represented in a matrix V, the most used way is apply a TF-IDF Vectorizer, which reflect how important a word is to a document.

Each element of the list of list of labels correspond to a document. These elements contain a list of topics that contrain the document. For example

labels = [[],
          [0,2], # document 1
          [],
          [],
          [1]] # document 4

means that the document 1 is contrained to be topic 0 or 2 and document 4 to be topic 1. For the other documents all the topics are permitted.

Finally, to run TS-NMF:

from tsnmf import TSNMF

tsnmf = TSNMF(n_components=3, random_state=1)
W = tsnmf.fit_transform(V, labels=labels)
H = tsnmf.components_

Credits

  • Developed mainly by Victor Navarro (@vokturz), under the guidance of Eduardo Graells-Garrido (@carnby), in the context of CONICYT Fondo de Fomento al Desarrollo Científico y Tecnológico (FONDECYT) Proyecto de Iniciación 11180913.
  • Based on scikit-learn's NMF code and the original ws-nmf.

References

  1. MacMillan, Kelsey, and James D. Wilson. "Topic supervised non-negative matrix factorization." arXiv preprint arXiv:1706.05084 (2017).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsnmf-1.0.4.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tsnmf-1.0.4-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file tsnmf-1.0.4.tar.gz.

File metadata

  • Download URL: tsnmf-1.0.4.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.2

File hashes

Hashes for tsnmf-1.0.4.tar.gz
Algorithm Hash digest
SHA256 d1fdf999bfac51a468b207c948d1f30d8cd867c94c2ce4e089d0d807b63126e1
MD5 d65511683165e60c10a49a55335e4226
BLAKE2b-256 f724f06c37341efe39b8d535de0b241a7f5122f36c79b59d158641b241b954b8

See more details on using hashes here.

File details

Details for the file tsnmf-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: tsnmf-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.2

File hashes

Hashes for tsnmf-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fbf11e3e772c80045ab960dc65405a9041c6e5dcc342ba115eedf84c65e02a40
MD5 c7032e401136a5443e5722a39b0c8d78
BLAKE2b-256 b89c12f73495bc33ccd0e15c1ecee27c1f58e4c98bb2a8ff837702d99dc5c58c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page