Skip to main content

Visualization of Topic Modeling Results

Project description

tmplot

Documentation Status Downloads PyPI

tmplot is a Python package for visualizing topic modeling results. It provides the interactive report interface that borrows much from LDAvis/pyLDAvis and builds upon it offering a number of metrics for calculating topics distances and a number of algorithms for calculating scatter coordinates of topics.

Plots

Features

  • Supported models:

    • tomotopy: LDAModel, LLDAModel, CTModel, DMRModel, HDPModel, PTModel, SLDAModel, GDMRModel
    • gensim: LdaModel, LdaMulticore
    • bitermplus: BTM
  • Supported distance metrics:

    • Kullback-Leibler (symmetric and non-symmetric) divergence
    • Jenson-Shannon divergence
    • Jeffrey's divergence
    • Hellinger distance
    • Bhattacharyya distance
    • Total variation distance
    • Jaccard inversed index
  • Supported algorithms for calculating topics scatter coordinates:

    • t-SNE
    • SpectralEmbedding
    • MDS
    • LocallyLinearEmbedding
    • Isomap

Installation

The package can be installed from PyPi:

pip install tmplot

Or directly from this repository:

pip install git+https://github.com/maximtrp/tmplot.git

Dependencies

  • numpy
  • scipy
  • scikit-learn
  • pandas
  • altair
  • ipywidgets
  • tomotopy, gensim, and bitermplus

Quick example

# Importing packages
import tmplot as tmp
import pickle as pkl
import pandas as pd

# Reading a model from a file
with open('data/model.pkl', 'rb') as file:
    model = pkl.load(file)

# Reading documents from a file
docs = pd.read_csv('data/docs.txt.gz', header=None).values.ravel()

# Plotting topics as a scatter plot
topics_coords = tmp.prepare_coords(model)
tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label')

# Plotting terms probabilities
terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)
tmp.plot_terms(terms_probs)

# Running report interface
tmp.report(model, docs=docs, width=250)

You can find more examples in the tutorial.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmplot-0.0.4.tar.gz (14.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page