Skip to main content

InPhO Topic Explorer

Project description

Travis GitHub license PyPI

This interactive visualization displays information from the LDA topic models generated using the InPhO VSM module. Live demos trained on the Stanford Encyclopedia of Philosophy, a selection of books from the HathiTrust Digital Library, and the original LDA training set of Associated Press articles are available at http://inphodata.cogs.indiana.edu.

The color bands within each article’s row show the topic distribution within that article, and the relative sizes of each band indicates the weight of that topic in the article. The total width of each row indicates similarity to the focal topic or document, measured by the quantity sim(doc) = 1 – JSD(doc, focus entity), where JSD is the Jensen-Shannon distance between the word probability distributions of each item. Each topic’s label and color is arbitrarily assigned, but is consistent across articles in the browser.

Display options include topic normalization, alphabetical sort and topic sort. By normalizing topics, the combined width of each bar expands so that topic weights per document can be compared. By clicking a topic, the documents will reorder acoording to that topic’s weight and topic bars will reorder according to the topic weights in the highest weighted document. When a topic is selected, clicking “Top Documents for [Topic]” will take you to a new page showing the most similar documents to that topic’s word distribution. The original sort order can be restored with the “Reset Topic Sort” button.

Installation

There are two types of install: Default and Developer.

Default Install

  1. Install the Anaconda Python 2.7 Distribution.

  2. Open a terminal and run pip install --pre topicexplorer.

  3. Test installation by typing vsm -h to print usage instructions.

Developer Install

  1. Set up Git

  2. Install the Anaconda Python 2.7 Distribution.

  3. Open a terminal and run pip install --src . -e git+https://github.com/inpho/topic-explorer#egg=topicexplorer

  4. Test installation by typing vsm -h to print usage instructions.

Usage

Workflow

Workflow

  1. Initialize the Topic Explorer on a file, folder of text files, or folder of folders:

    vsm init PATH [CONFIG]

    This will generate a configuration file called CONFIG.

  2. Train LDA models using the on-screen instructions:

    vsm train CONFIG
  3. Launch the topic explorer:

    vsm launch CONFIG
  4. Press Ctrl+C to quit all servers.

See the sample configuration files in the config directory for examples of how to extend the topic explorer.

Bug Reports

Please report issues on the issue tracker or contact Jaimie directly (contact info at bottom of README).

In your report, please include the error message, the command you ran, your operating system, and the output of the command vsm --version. This will ensure that we can quickly diagnose your issue.

Note: When using a developer install vsm --version will print in the following format: 1.0b39-1-g7c834bf-dirty. * The first part is the most recent release tag. (1.0b39) * The second part is the number of commits since the tag. (1) * The next is the hash of the most recent commit. (g7c834bf) * The optional -dirty flag indicates that the local repository has uncommitted changes.

Alternate Installs

We highly recommend using the Anaconda Python 2.7 Distribution. Straightforward instructions are provided above for Anaconda Python 2.7 for both end users and developers. Both of these installs are officially supported.

Below we offer guidance for installing side-by-side with an Anaconda Python 3.5 install or for installing it without Anaconda, with notes on dependencies.

Python 3 Install

The InPhO Topic Explorer is only compatible with Python 2.7. However, Anaconda for Python 3.5 makes it easy to set up a side-by-side install of Python 2.7 so you can use both Python 3.5 and Python 2.7.

  1. Install the Anaconda Python 3.5 Distribution.

  2. Open a terminal and run conda create -n py27 python=2.7 anaconda. This will create a Python 2.7 Anaconda environment.

  3. Run source activate py27 to activate the Python 2.7 bindings. You should see (py27) before your prompt.

  4. Use either the Default or Developer install instructions, skipping the step to install Anaconda Python 2.7.

  5. Run source deactivate to deactivate Python 2.7 bindings and reactivate Python 3.5 bindings. Note that the vsm command will only work when the Python 2.7 bindings are activated.

Non-Anaconda Install

  • Miniconda

  1. If using Miniconda (a small version of Anaconda), the necessary packages are: conda install numpy scipy nltk matplotplib ipython networkx

  • Debian/Ubuntu

  1. sudo apt-get-install build-essential python-dev python-pip python-numpy python-matplotlib python-scipy python-ipython

  2. IPython Notebooks

  • Windows

  1. Install Microsoft Visual C++ Compiler for Python 2.7

  2. Install the Python packages below:

Licensing and Attribution

The project is released under an Open-Source Initiative-approved MIT License.

The InPhO Topic Explorer may be cited as:

  • Jaimie Murdock and Colin Allen. (2015) Visualization Techniques for Topic Model Checking in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI-15). Austin, Texas, USA, January 25-29, 2015. http://inphodata.cogs.indiana.edu/

A BibTeX file is included in the repository for easier attribution.

Collaboration and Maintenance

The InPhO Topic Explorer is maintained by Jaimie Murdock:

Please report issues on the issue tracker or contact Jaimie directly.

We are open to collaboration! If there’s a feature you’d like to see implemented, please contact us and we can lend advice and technical assistance.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topicexplorer-1.0b60.tar.gz (702.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

topicexplorer-1.0b60-py2.7.egg (763.4 kB view details)

Uploaded Egg

File details

Details for the file topicexplorer-1.0b60.tar.gz.

File metadata

  • Download URL: topicexplorer-1.0b60.tar.gz
  • Upload date:
  • Size: 702.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for topicexplorer-1.0b60.tar.gz
Algorithm Hash digest
SHA256 cfb4f6d71ea908e6621f3c5f8b18db800c1bb62d4bc2d0f994b0c3a3e4afdbd9
MD5 d1fad5877370abfa4c197ecdcf0dd22b
BLAKE2b-256 cb17a9fe586042b0ec28b6608f6756cf888c1f5521bd53a20b86cf2508898093

See more details on using hashes here.

File details

Details for the file topicexplorer-1.0b60-py2.7.egg.

File metadata

File hashes

Hashes for topicexplorer-1.0b60-py2.7.egg
Algorithm Hash digest
SHA256 5ab0d642cd99b64f9be7039b81ea08faa0e7b015c0ba8cbcea8a1956ffbaf94c
MD5 df0de7bbca8c9815f0072d7bcbd07cf9
BLAKE2b-256 ccd65fb4e7c08942e5455016f46d91d5526157ae918af0a64fbf0b9a711bf403

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page