Visualize the weights (i.e., topics) and hidden units (i.e., topic proportions) of topic models
Project description
Toplot
Visualizations for topic models.
Installation
pip3 install toplot
Getting started
Topic modelling is a Bayesian endevour. After training your topic model with $K$ components, you've inferred the distribution over two latent variables:
- The posterior over the weights (i.e., the topics) of the model $\pmb{W} = [\pmb{w}_1, \dots, \pmb{w}_K]^T$. We assume that the weights have a two-level structure: each weight is composed of categorical variables (or actually, multinomials), each consisting of a set of categories.
- Per training example $i$, the posterior over the hidden units $\pmb{h}^{(i)}$ (topic loadings, also denoted as $\pmb{\theta}_i$ in LDA).
Visualizing weights (the topic/cluster, $\pmb{w}$ or $\pmb{\phi}$)
Toplot expects your topic model's posterior samples to be organized in specific ways. As an example, we draw 1000 samples from "fake" topic weights $\pmb{W}$ containing two categories, body mass index (BMI) and sex, consisting of three and two categories each, respectively.
import pandas as pd
from numpy.random import dirichlet
# Draw 1000 samples from "posterior" distribution.
weight_bmi = dirichlet([16.0, 32.0, 32.0], size=1_000)
weight_sex = dirichlet([8.1, 4.1], size=1_000)
weight = pd.concat(
{
"BMI": pd.DataFrame(
weight_bmi, columns=["Underweight", "Healthy Weight", "Overweight"]
),
"sex": pd.DataFrame(weight_sex, columns=["Male", "Female"]),
},
axis="columns",
)
Use bar_plot to visualize the topic weight, including the 95% quantile range:
from toplot import bar_plot
bar_plot(weight)
If you have many multinomials, you can use bar_plot_stacked to reduce the width of the plot. This plot folds the categories (e.g., "Underweight", "Healthy Weight", and "Overweight") belonging to the same multinomial (BMI) into a single bar.
from toplot import bar_plot_stacked
bar_plot_stacked(weight)
To visualize more than one topic at a time, you can make a scattermap with scattermap.
Visualizing hidden units (topic proportions, $\pmb{h}$ or $\pmb{\theta}$)
Next, we plot the hidden units/topic identities $[\pmb{h}^{(1)}, \dots, \pmb{h}^{(m)}]^T$: that is, for each record $i$, the proportion over the components/topics. Let's generate the (average) proportion for $m=30$ records to visualize:
hidden = pd.DataFrame(
dirichlet([0.6, 0.8, 0.2], size=30), # 30 records
columns=["Topic_1", "Topic_2", "Topic_3"],
)
The function plot_cohort computes the distance between all examples (the cohort) and, by default, sorts them accordingly using the travelling salesman problem.
Currently, no uncertainty visualization is supported for plot_cohort (like in bar_plot), so you need to pass the posterior average.
from toplot import plot_cohort
plot_cohort(hidden)
You can emphasize the periodicity inherent in the travelling salesman solution by visualizing all the examples using a polar plot:
from toplot import plot_polar_cohort
plot_polar_cohort(hidden)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toplot-1.1.0.tar.gz.
File metadata
- Download URL: toplot-1.1.0.tar.gz
- Upload date:
- Size: 84.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a934fd7729d4304e039c55dd746cacdccc0ac5a195f4924f569acbb1f73985b2
|
|
| MD5 |
bfc94646148fae940f6c7fe9e1adc0ca
|
|
| BLAKE2b-256 |
98abc93d96bf83e9ef6301b55feaad235ed9f658de85e78fb70076f743470c8b
|
File details
Details for the file toplot-1.1.0-py3-none-any.whl.
File metadata
- Download URL: toplot-1.1.0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3abe26ceeeaefadeed8096c54e389e0dfc9446a3e3e80d89f8c22c1c1e5a20a9
|
|
| MD5 |
9e99480f633f82431e5ced0536b86f80
|
|
| BLAKE2b-256 |
b3c7f7217563f8d085cf44fca32df82c697b26bf8cde926a640e04a3fce35899
|