Pretty and opinionated topic model visualization in Python.
Project description
topicwizard
Pretty and opinionated topic model visualization in Python.
New in version 0.4.0 🌟 🌟
- Introduced topic pipelines that make it easier and safer to use topic models in downstream tasks and interpretation.
New in version 0.3.1 🌟 🌟
- You can now investigate relations of pre-existing labels to your topics and words :mag:
New in version 0.3.0 🌟
- Exclude pages, that are not needed :bird:
- Self-contained interactive figures :gift:
- Topic name inference is now default behavior and is done implicitly.
Features
- Investigate complex relations between topics, words, documents and groups/genres/labels
- Sklearn, Gensim and BERTopic compatible :nut_and_bolt:
- Highly interactive web app
- Interactive and composable Plotly figures
- Automatically infer topic names, oooor...
- Name topics manually
- Easy deployment :earth_africa:
Installation
Install from PyPI:
pip install topic-wizard
Usage (documentation)
Step 0:
Have a corpus ready for analysis, in this example I am going to use 20 newgroups from scikit-learn.
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data
# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = [newsgroups.target_names[label] for label in newsgroups.target]
Step 1:
Train a scikit-learn compatible topic model. (If you want to use non-scikit-learn topic models, check compatibility)
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
# Create topic pipeline
pipeline = make_pipeline(
CountVectorizer(stop_words="english", min_df=10),
NMF(n_components=30),
)
# Then fit it on the given texts
pipeline.fit(corpus)
From version 0.4.0 you can also use TopicPipelines, which are almost functionally identical but come with a set of built-in conveniences and safeties.
from topicwizard.pipeline import make_topic_pipeline
pipeline = make_topic_pipeline(
CountVectorizer(stop_words="english", min_df=10),
NMF(n_components=30),
)
Step 2a:
Visualize with the topicwizard webapp :bulb:
import topicwizard
topicwizard.visualize(corpus, pipeline=pipeline)
From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:
# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(corpus, pipeline=pipeline, exclude_pages=["documents"])
From version 0.3.1 you can investigate groups/labels by passing them along to the webapp.
topicwizard.visualize(corpus, pipeline=pipeline, group_labels=group_labels)
Ooooor...
Step 2b:
Produce high quality self-contained HTML plots and create your own dashboards/reports :strawberry:
Map of words
from topicwizard.figures import word_map
word_map(corpus, pipeline=pipeline)
Timelines of topic distributions
from topicwizard.figures import document_topic_timeline
document_topic_timeline(
"Joe Biden takes over presidential office from Donald Trump.",
pipeline=pipeline,
)
Wordclouds of your topics :cloud:
from topicwizard.figures import topic_wordclouds
topic_wordclouds(corpus, pipeline=pipeline)
And much more... (documentation)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file topic_wizard-0.4.0.tar.gz
.
File metadata
- Download URL: topic_wizard-0.4.0.tar.gz
- Upload date:
- Size: 76.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-78-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2f85f9d81b8fc900647e7c6d4c502320659862186887410cd8c8a23b8a31102 |
|
MD5 | 0062b99b8ed4b2103a269e91c80e409a |
|
BLAKE2b-256 | d4e019d148f58617bcd4f9868536fd405f8f5e25d00e87e1b1da08ade455ba90 |
File details
Details for the file topic_wizard-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: topic_wizard-0.4.0-py3-none-any.whl
- Upload date:
- Size: 97.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-78-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e834c3ec6f3a10f70574f4efde2f1cdcea57dd39af9a1eecb415002f5975986c |
|
MD5 | 16a5aec2d4e5b782ea417a2234a10e9f |
|
BLAKE2b-256 | 89bea77e628c25702aa9b4d8a74183455c087048416d57e047dff7fb8e14815c |