Skip to main content

Interactive topic model visualization. Port of the R package.

Project description

pyLDAvis
========

Python library for interactive topic model visualization.
This is a port of the fabulous `R package <https://github.com/cpsievert/LDAvis>`__ by Carson Sievert and Kenny Shirley.

.. figure:: http://www.kennyshirley.com/figures/ldavis-pic.png
:alt: LDAvis icon

**pyLDAvis** is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.

The visualization is intended to be used within an IPython notebook but can also be saved to a stand-alone HTML file for easy sharing.

|version status| |build status| |docs|

Installation
~~~~~~~~~~~~~~~~~~~~~~

- Stable version using pip:

::

pip install pyldavis

- Development version on GitHub

Clone the repository and run ``python setup.py``

Usage
~~~~~~~~~~~~~~~~~~~~~~

The best way to learn how to use **pyLDAvis** is to see it in action.
Check out this `notebook for an overview <http://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb>`__.
Refer to the `documentation <https://pyLDAvis.readthedocs.org>`__ for details.

For a concise explanation of the visualization see this
`vignette <http://cran.r-project.org/web/packages/LDAvis/vignettes/details.pdf>`__ from the LDAvis R package.

Video demos
~~~~~~~~~~~

Ben Mabey walked through the visualization in this short talk using a Hacker News corpus:

- `Visualizing Topic Models <https://www.youtube.com/watch?v=tGxW2BzC_DU&index=4&list=PLykRMO7ZuHwP5cWnbEmP_mUIVgzd5DZgH>`__
- `Notebook and visualization used in the demo <http://nbviewer.ipython.org/github/bmabey/hacker_news_topic_modelling/blob/master/HN%20Topic%20Model%20Talk.ipynb>`__
- `Slide deck <https://speakerdeck.com/bmabey/visualizing-topic-models>`__


Carson Sievert created a video demoing the R package. The visualization is the same and so it applies equally to pyLDAvis:

- `Visualizing & Exploring the Twenty Newsgroup Data <http://stat-graphics.org/movies/ldavis.html>`__

More documentation
~~~~~~~~~~~~~~~~~~

To read about the methodology behind pyLDAvis, see `the original
paper <http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf>`__,
which was presented at the `2014 ACL Workshop on Interactive Language
Learning, Visualization, and
Interfaces <http://nlp.stanford.edu/events/illvi2014/>`__ in Baltimore
on June 27, 2014.




.. |version status| image:: https://img.shields.io/pypi/v/pyLDAvis.svg
:target: https://pypi.python.org/pypi/pyLDAvis
.. |build status| image:: https://travis-ci.org/bmabey/pyLDAvis.png?branch=master
:target: https://travis-ci.org/bmabey/pyLDAvis
.. |docs| image:: https://readthedocs.org/projects/pyldavis/badge/?version=latest
:target: https://pyLDAvis.readthedocs.org




History
-------

1.5.0 (2016-02-20)
---------------------

* Red Bar Width bug fix

In some cases, the widths of the red topic-term bars did not decrease (as they should have) from term \#1 to
term \#R under the relevance ranking with $\lambda = 1$. In other words, when $\lambda = 1$, there were topics
in which a narrow red bar was displayed above a wider red bar, which should never happen. The issue had to do
with the way topic-term bar widths are computed, and is discussed in detail in #32.


In the end, we implemented a quick fix in which we compute term frequencies implicitly, rather than using those
supplied in the createJSON() function. The upside is that the red bar widths are now explicitly controlled to
produce the correct visualization. The downside is that the blue bar widths do not necessarily match the
user-supplied term frequencies exactly -- in fact, the new version of LDAvis ignores the user-supplied term
frequencies entirely. In a few experiments, the differences are small, and decrease (as a proportion of the true
term frequencies) as the true term frequencies increase.



1.4.1 (2016-01-31)
---------------------

* Included requirements.txt in MANIFEST to (hopefully) fix bad release.

1.4.0 (2016-01-31)
---------------------

* Updated to newest version of skibio for PCoA mds.
* requirements.txt cleanup
* New 'tsne' option for prepare, see docs and notebook for more info.


1.3.5 (2015-12-18)
---------------------

* Add explicit version info for scikit-bio since the API has changed.


1.3.4 (2015-11-16)
---------------------

* Gensim Python typo fix in imports. :/

1.3.3 (2015-11-13)
---------------------

* Gensim Python 2.x fix for absolute imports.

1.3.2 (2015-11-09)
---------------------

* Gensim prepare 25% speed increase, thanks @mattilyra!
* Pandas deprecation warnings are now gone.
* Pandas v0.17 is now being used.

1.3.1 (2015-11-02)
---------------------

* Updates gensim and other logic to be python 3 compatible.

1.3.0 (2015-08-20)
---------------------

* Fixes gensim logic and makes it more robust.
* Faster graphlab processing.
* kargs for gensim and graphlab are passed down to underlying prepare function.
* Requires recent version of pandas to avoid problems with our use of the newer `DataFrame.to_dict` API.

1.2.0 (2015-06-13)
---------------------

* Updates gensim logic to be clearer and work with Python 3.x.

1.1.0 (2015-06-02)
---------------------

* Fixes bug with GraphLab function that was producing bogus visualizations.

1.0.0 (2015-05-29)
---------------------

* First release on PyPI. Faithful port of R version with IPython support and helper functions for GraphLab & gensim.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyLDAvis-1.5.0.tar.gz (2.0 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page