Topic modeling with latent Dirichlet allocation
Project description
Topic modeling with latent Dirichlet allocation. lda aims for simplicity.
lda implements latent Dirichlet allocation (LDA) using collapsed Gibbs sampling. LDA is described in Blei et al. (2003) and Pritchard et al. (2000).
Installation
pip install lda
Getting started
lda.LDA implements latent Dirichlet allocation (LDA). The interface follows conventions found in scikit-learn.
>>> import numpy as np
>>> import lda
>>> X = np.array([[1,1], [2, 1], [3, 1], [4, 1], [5, 8], [6, 1]])
>>> model = lda.LDA(n_topics=2, n_iter, random_state=1)
>>> doc_topic = model.fit_transform(X) # estimate of document-topic distributions
>>> model.components_ # estimate of topic-word distributions; model.doc_topic_ is an alias
Requirements
Python 3 is required. The following packages are also required
Caveat
lda aims for simplicity over speed. If you are working with large corpora or want to use faster and more sophisticated topic models, consider using hca or MALLET. hca is written in C and MALLET_ is written in Java.
Important links
Documentation: http://pythonhosted.org/lda
Source code: https://github.com/ariddell/lda/
Issue tracker: https://github.com/ariddell/lda/issues
License
horizont is licensed under Version 2.0 of the Mozilla Public License.