Hierarchical NMF
Project description
hierarchical-nmf-python
- fork of https://github.com/rudvlf0413/hierarchical-nmf-python
- with familiar SKLearn interface
Installation
pip install hnmf
Usage
20 Newsgroups
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from hnmf.model import HierarchicalNMF
n_features = 1000
n_leaves = 20
data, _ = fetch_20newsgroups(shuffle=True, random_state=1,
remove=('headers', 'footers', 'quotes'),
return_X_y=True)
# Use tf-idf features for NMF.
tfidf = TfidfVectorizer(max_df=0.95, min_df=2,
max_features=n_features,
stop_words='english')
X = tfidf.fit_transform(data)
id2feature = {i: token for i, token in enumerate(tfidf.get_feature_names_out())}
# hNMF
model = HierarchicalNMF(k=n_leaves)
model.fit(X)
model.cluster_features(id2feature=id2feature)
Reference
-
Papers: Fast rank-2 nonnegative matrix factorization for hierarchical document clustering
-
Original version of codes (matlab): https://github.com/dakuang/hiernmf2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
hNMF-0.1.2-py3-none-any.whl
(15.5 kB
view hashes)