Skip to main content

Visualization of Topic Modeling Results

Project description

tmplot

Codacy coverage Codacy grade GitHub Workflow Status Documentation Status Downloads PyPI Issues

tmplot is a Python package for analysis and visualization of topic modeling results. It provides the interactive report interface that borrows much from LDAvis/pyLDAvis and builds upon it offering a number of metrics for calculating topic distances and a number of algorithms for calculating scatter coordinates of topics. It can be used to select closest and stable topics across multiple models.

Plots

Features

  • Supported models:

    • tomotopy: LDAModel, LLDAModel, CTModel, DMRModel, HDPModel, PTModel, SLDAModel, GDMRModel
    • gensim: LdaModel, LdaMulticore
    • bitermplus: BTM
  • Supported distance metrics:

    • Kullback-Leibler (symmetric and non-symmetric) divergence
    • Jenson-Shannon divergence
    • Jeffrey's divergence
    • Hellinger distance
    • Bhattacharyya distance
    • Total variation distance
    • Jaccard inversed index
  • Supported algorithms for calculating topics scatter coordinates:

    • t-SNE
    • SpectralEmbedding
    • MDS
    • LocallyLinearEmbedding
    • Isomap

Donate

If you find this package useful, please consider donating any amount of money. This will help me spend more time on supporting open-source software.

Buy Me A Coffee

Installation

The package can be installed from PyPi:

pip install tmplot

Or directly from this repository:

pip install git+https://github.com/maximtrp/tmplot.git

Dependencies

  • numpy
  • scipy
  • scikit-learn
  • pandas
  • altair
  • ipywidgets
  • tomotopy, gensim, and bitermplus (optional)

Quick example

# Importing packages
import tmplot as tmp
import pickle as pkl
import pandas as pd

# Reading a model from a file
with open('data/model.pkl', 'rb') as file:
    model = pkl.load(file)

# Reading documents from a file
docs = pd.read_csv('data/docs.txt.gz', header=None).values.ravel()

# Plotting topics as a scatter plot
topics_coords = tmp.prepare_coords(model)
tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label')

# Plotting terms probabilities
terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)
tmp.plot_terms(terms_probs)

# Running report interface
tmp.report(model, docs=docs, width=250)

You can find more examples in the tutorial.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmplot-0.2.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

tmplot-0.2.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file tmplot-0.2.0.tar.gz.

File metadata

  • Download URL: tmplot-0.2.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tmplot-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3f029e1441b5b4fc15be4b0c1b62780eee5dfd0d448dfaa20da417813f3e35fb
MD5 98de3d455618785496212eb827bd81b7
BLAKE2b-256 133e3c6b1cfe87488fc320a3d24535ff05e8fc133e1bc9653528454b155c4fe0

See more details on using hashes here.

File details

Details for the file tmplot-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tmplot-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tmplot-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab4206e236d8856f242a6747eba10ee877de66cdc5b82d66dc63d16ada43d34d
MD5 91299f00d6b56f1ca72e5176097370d9
BLAKE2b-256 f7f8dcbf7d02f63c30e90265908f7bf05baa175adae70e9b53784f31af3427b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page