Implementation of Topic-Supervised Non-Negative Matrix Factorization
Project description
tsnmf-sparse
This repository contains an implementation of Topic-Supervised Non-Negative Matrix Factorization (TS-NMF) [1] with Sparse Matrices in Python, using a Scikit-Learn's compatible API.
How it Works
From [1]: Suppose that one supervises k << n documents and identifies l << t topics that were contained in a subset of the documents. One can supervise the NMF method using this information, represented by an n×d topic supervision matrix L.The elements of L contrain the importance weights of matrix W and are of the following form:
Then, for a term-document matrix V and supervision matrix L, TS-NMF seeks matrices W and H that minimize
Where â—‹ represent the Hadamard (element-wise) product operator.
Installation
You can install TS-NMF via pip:
pip install tsnmf
Or clonning this repository and running setup.py:
python setup.py install
Usage
TS-NMF is used in a similar way as the module decomposition.NMF from Scikit-Learn. The extra thing that you need is a list of list that contains the labels to build the matrix L.
Suppose you want to get 3 topics from 5 documents. The 5 documents should be represented in a matrix V, the most used way is apply a TF-IDF Vectorizer, which reflect how important a word is to a document.
Each element of the list of list of labels correspond to a document. These elements contain a list of topics that contrain the document. For example
labels = [[],
[0,2], # document 1
[],
[],
[1]] # document 4
means that the document 1 is contrained to be topic 0 or 2 and document 4 to be topic 1. For the other documents all the topics are permitted.
Finally, to run TS-NMF:
from tsnmf import TSNMF
tsnmf = TSNMF(n_components=3, random_state=1)
W = tsnmf.fit_transform(V, labels=labels)
H = tsnmf.components_
Credits
- Developed mainly by Victor Navarro (@vokturz), under the guidance of Eduardo Graells-Garrido (@carnby), in the context of CONICYT Fondo de Fomento al Desarrollo CientÃfico y Tecnológico (FONDECYT) Proyecto de Iniciación 11180913.
- Based on scikit-learn's NMF code and the original ws-nmf.
References
- MacMillan, Kelsey, and James D. Wilson. "Topic supervised non-negative matrix factorization." arXiv preprint arXiv:1706.05084 (2017).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tsnmf-1.0.4.tar.gz.
File metadata
- Download URL: tsnmf-1.0.4.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1fdf999bfac51a468b207c948d1f30d8cd867c94c2ce4e089d0d807b63126e1
|
|
| MD5 |
d65511683165e60c10a49a55335e4226
|
|
| BLAKE2b-256 |
f724f06c37341efe39b8d535de0b241a7f5122f36c79b59d158641b241b954b8
|
File details
Details for the file tsnmf-1.0.4-py3-none-any.whl.
File metadata
- Download URL: tsnmf-1.0.4-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbf11e3e772c80045ab960dc65405a9041c6e5dcc342ba115eedf84c65e02a40
|
|
| MD5 |
c7032e401136a5443e5722a39b0c8d78
|
|
| BLAKE2b-256 |
b89c12f73495bc33ccd0e15c1ecee27c1f58e4c98bb2a8ff837702d99dc5c58c
|