Skip to main content

Visualize the weights (i.e., topics) and hidden units (i.e., topic proportions) of topic models

Project description

Toplot

Reference docs

Visualizations for topic models.

Installation

pip3 install toplot

Getting started

Topic modelling is a Bayesian endevour. After training your topic model with $K$ components, you've inferred the distribution over two latent variables:

  • The posterior over the weights (i.e., the topics) of the model $\pmb{W} = [\pmb{w}_1, \dots, \pmb{w}_K]^T$. We assume that the weights have a two-level structure: each weight is composed of categorical variables (or actually, multinomials), each consisting of a set of categories.
  • Per training example $i$, the posterior over the hidden units $\pmb{h}^{(i)}$ (topic loadings, also denoted as $\pmb{\theta}_i$ in LDA).

Visualizing weights (the topic/cluster, $\pmb{w}$ or $\pmb{\phi}$)

Toplot expects your topic model's posterior samples to be organized in specific ways. As an example, we draw 1000 samples from "fake" topic weights $\pmb{W}$ containing two categories, body mass index (BMI) and sex, consisting of three and two categories each, respectively.

import pandas as pd
from numpy.random import dirichlet

# Draw 1000 samples from "posterior" distribution.
weight_bmi = dirichlet([16.0, 32.0, 32.0], size=1_000)
weight_sex = dirichlet([8.1, 4.1], size=1_000)
weight = pd.concat(
    {
        "BMI": pd.DataFrame(
            weight_bmi, columns=["Underweight", "Healthy Weight", "Overweight"]
        ),
        "sex": pd.DataFrame(weight_sex, columns=["Male", "Female"]),
    },
    axis="columns",
)

Use bar_plot to visualize the topic weight, including the 95% quantile range:

Visualization of topic weights with bar_plot.

from toplot import bar_plot

bar_plot(weight)

If you have many multinomials, you can use bar_plot_stacked to reduce the width of the plot. This plot folds the categories (e.g., "Underweight", "Healthy Weight", and "Overweight") belonging to the same multinomial (BMI) into a single bar.

Visualization of topic weights with bar_plot_stacked.

from toplot import bar_plot_stacked

bar_plot_stacked(weight)

To visualize more than one topic at a time, you can make a scattermap with scattermap.

Visualizing hidden units (topic proportions, $\pmb{h}$ or $\pmb{\theta}$)

Next, we plot the hidden units/topic identities $[\pmb{h}^{(1)}, \dots, \pmb{h}^{(m)}]^T$: that is, for each record $i$, the proportion over the components/topics. Let's generate the (average) proportion for $m=30$ records to visualize:

hidden = pd.DataFrame(
    dirichlet([0.6, 0.8, 0.2], size=30),  # 30 records
    columns=["Topic_1", "Topic_2", "Topic_3"],
)

The function plot_cohort computes the distance between all examples (the cohort) and, by default, sorts them accordingly using the travelling salesman problem. Currently, no uncertainty visualization is supported for plot_cohort (like in bar_plot), so you need to pass the posterior average. Visualization of hidden units, or topic identities, with plot_cohort

from toplot import plot_cohort

plot_cohort(hidden)

You can emphasize the periodicity inherent in the travelling salesman solution by visualizing all the examples using a polar plot:

Visualization of hidden units, or topic identities, emphasizing the periodicity with plot_polar_cohort

from toplot import plot_polar_cohort

plot_polar_cohort(hidden)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toplot-1.1.0.tar.gz (84.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toplot-1.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file toplot-1.1.0.tar.gz.

File metadata

  • Download URL: toplot-1.1.0.tar.gz
  • Upload date:
  • Size: 84.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for toplot-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a934fd7729d4304e039c55dd746cacdccc0ac5a195f4924f569acbb1f73985b2
MD5 bfc94646148fae940f6c7fe9e1adc0ca
BLAKE2b-256 98abc93d96bf83e9ef6301b55feaad235ed9f658de85e78fb70076f743470c8b

See more details on using hashes here.

File details

Details for the file toplot-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: toplot-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for toplot-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3abe26ceeeaefadeed8096c54e389e0dfc9446a3e3e80d89f8c22c1c1e5a20a9
MD5 9e99480f633f82431e5ced0536b86f80
BLAKE2b-256 b3c7f7217563f8d085cf44fca32df82c697b26bf8cde926a640e04a3fce35899

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page