Skip to main content

Visualization of Topic Modeling Results

Project description

tmplot

Codacy coverage Codacy grade GitHub Workflow Status Documentation Status Downloads PyPI Issues

tmplot is a Python package for analysis and visualization of topic modeling results. It provides the interactive report interface that borrows much from LDAvis/pyLDAvis and builds upon it offering a number of metrics for calculating topic distances and a number of algorithms for calculating scatter coordinates of topics. It can be used to select closest and stable topics across multiple models.

Plots

Features

  • Supported models:

    • tomotopy: LDAModel, LLDAModel, CTModel, DMRModel, HDPModel, PTModel, SLDAModel, GDMRModel
    • gensim: LdaModel, LdaMulticore
    • bitermplus: BTM
  • Supported distance metrics:

    • Kullback-Leibler (symmetric and non-symmetric) divergence
    • Jenson-Shannon divergence
    • Jeffrey's divergence
    • Hellinger distance
    • Bhattacharyya distance
    • Total variation distance
    • Jaccard inversed index
  • Supported algorithms for calculating topics scatter coordinates:

    • t-SNE
    • SpectralEmbedding
    • MDS
    • LocallyLinearEmbedding
    • Isomap

Donate

If you find this package useful, please consider donating any amount of money. This will help me spend more time on supporting open-source software.

Buy Me A Coffee

Installation

The package can be installed from PyPi:

pip install tmplot

Or directly from this repository:

pip install git+https://github.com/maximtrp/tmplot.git

Dependencies

  • numpy
  • scipy
  • scikit-learn
  • pandas
  • altair
  • ipywidgets
  • tomotopy, gensim, and bitermplus (optional)

Quick example

# Importing packages
import tmplot as tmp
import pickle as pkl
import pandas as pd

# Reading a model from a file
with open('data/model.pkl', 'rb') as file:
    model = pkl.load(file)

# Reading documents from a file
docs = pd.read_csv('data/docs.txt.gz', header=None).values.ravel()

# Plotting topics as a scatter plot
topics_coords = tmp.prepare_coords(model)
tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label')

# Plotting terms probabilities
terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)
tmp.plot_terms(terms_probs)

# Running report interface
tmp.report(model, docs=docs, width=250)

You can find more examples in the tutorial.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmplot-0.1.3.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

tmplot-0.1.3-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file tmplot-0.1.3.tar.gz.

File metadata

  • Download URL: tmplot-0.1.3.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for tmplot-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9adc686251ff24a687d7852eeb034d458fcb6177271af9627b27539b915c7180
MD5 9a5b31e1f0aa947ea3e066e4d5a209c4
BLAKE2b-256 25ced04e481a516a347b2248bfd985b2dda578771afcf80e079ce17b001c7d75

See more details on using hashes here.

File details

Details for the file tmplot-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: tmplot-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for tmplot-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 384bfbbaa3078e56fe77f6a9704cd7acff120ebea552947aa25a09cc54b8c067
MD5 e0d218eb0675a86b0e3c73803af73981
BLAKE2b-256 14524e14694a66ec2d0f082801262172cbc70c29b025d6264c94c9ec49d57775

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page