Pretty and opinionated topic model visualization in Python.
Project description
topicwizard
Pretty and opinionated topic model visualization in Python.
New in version 0.3.0 🌟 🌟
- Exclude pages, that are not needed :bird:
- Self-contained interactive figures :gift:
- Topic name inference is now default behavior and is done implicitly.
Features
- Investigate complex relations between topics, words and documents
- Highly interactive
- Automatically infer topic names
- Name topics manually
- Pretty :art:
- Intuitive :cow:
- Clean API :candy:
- Sklearn, Gensim and BERTopic compatible :nut_and_bolt:
- Easy deployment :earth_africa:
Installation
Install from PyPI:
pip install topic-wizard
Usage (documentation)
Step 1:
Train a scikit-learn compatible topic model. (If you want to use non-scikit-learn topic models, check compatibility)
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
# Create topic pipeline
topic_pipeline = make_pipeline(
CountVectorizer(),
NMF(n_components=10),
)
# Then fit it on the given texts
topic_pipeline.fit(texts)
Step 2a:
Visualize with the topicwizard webapp :bulb:
import topicwizard
topicwizard.visualize(pipeline=topic_pipeline, corpus=texts)
From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:
import topicwizard
# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(pipeline=topic_pipeline, corpus=texts, exclude_pages=["documents"])
Ooooor...
Step 2b:
Produce high quality self-contained HTML plots and create your own dashboards/reports :strawberry:
Map of words
from topicwizard.figures import word_map
word_map(corpus=texts, pipeline=pipeline)
Timelines of topic distributions
from topicwizard.figures import document_topic_timeline
document_topic_timeline(
"Joe Biden takes over presidential office from Donald Trump.",
pipeline=pipeline,
)
Wordclouds of your topics :cloud:
from topicwizard.figures import topic_wordclouds
topic_wordclouds(corpus=texts, pipeline=pipeline)
And much more (documentation)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
topic_wizard-0.3.0.tar.gz
(70.9 kB
view details)
Built Distribution
File details
Details for the file topic_wizard-0.3.0.tar.gz
.
File metadata
- Download URL: topic_wizard-0.3.0.tar.gz
- Upload date:
- Size: 70.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-75-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c74f739ee1bcb9300934ecebd6d126c2f0c33ba75911a508bf94b2de6c58804a |
|
MD5 | 8343af3b4888d0e2b1bd2cfe6a6703e7 |
|
BLAKE2b-256 | 66deaafb721b7f46b3704c1f3d311089b6b0a112bd97a4d51dce18945f6d77cf |
File details
Details for the file topic_wizard-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: topic_wizard-0.3.0-py3-none-any.whl
- Upload date:
- Size: 87.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-75-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b9c43134732f6f825ed6e01f0bc2014752f20cb552b160ed7b5bec9b63e8238 |
|
MD5 | d27a5658e90f5e7ff90f8f8565e3c410 |
|
BLAKE2b-256 | 0737261d5d8c56600472307878516589f644fae4c4969b801804e6da3388133d |