Skip to main content

Pretty and opinionated topic model visualization in Python.

Project description

topicwizard


Pretty and opinionated topic model visualization in Python.

Open in Colab PyPI version pip downloads python version Code style: black

https://user-images.githubusercontent.com/13087737/234209888-0d20ede9-2ea1-4d6e-b69b-71b863287cc9.mp4

New in version 0.3.0 🌟 🌟

  • Exclude pages, that are not needed :bird:
  • Self-contained interactive figures :gift:
  • Topic name inference is now default behavior and is done implicitly.

Features

  • Investigate complex relations between topics, words and documents
  • Highly interactive
  • Automatically infer topic names
  • Name topics manually
  • Pretty :art:
  • Intuitive :cow:
  • Clean API :candy:
  • Sklearn, Gensim and BERTopic compatible :nut_and_bolt:
  • Easy deployment :earth_africa:

Installation

Install from PyPI:

pip install topic-wizard

Usage (documentation)

Step 1:

Train a scikit-learn compatible topic model. (If you want to use non-scikit-learn topic models, check compatibility)

from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

# Create topic pipeline
topic_pipeline = make_pipeline(
    CountVectorizer(),
    NMF(n_components=10),
)

# Then fit it on the given texts
topic_pipeline.fit(texts)

Step 2a:

Visualize with the topicwizard webapp :bulb:

import topicwizard

topicwizard.visualize(pipeline=topic_pipeline, corpus=texts)

From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:

import topicwizard

# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(pipeline=topic_pipeline, corpus=texts, exclude_pages=["documents"])

topics screenshot words screenshot words screenshot documents screenshot

Ooooor...

Step 2b:

Produce high quality self-contained HTML plots and create your own dashboards/reports :strawberry:

Map of words

from topicwizard.figures import word_map

word_map(corpus=texts, pipeline=pipeline)

word map screenshot

Timelines of topic distributions

from topicwizard.figures import document_topic_timeline

document_topic_timeline(
    "Joe Biden takes over presidential office from Donald Trump.",
    pipeline=pipeline,
)

document timeline

Wordclouds of your topics :cloud:

from topicwizard.figures import topic_wordclouds

topic_wordclouds(corpus=texts, pipeline=pipeline)

wordclouds

And much more (documentation)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_wizard-0.3.0.tar.gz (70.9 kB view details)

Uploaded Source

Built Distribution

topic_wizard-0.3.0-py3-none-any.whl (87.2 kB view details)

Uploaded Python 3

File details

Details for the file topic_wizard-0.3.0.tar.gz.

File metadata

  • Download URL: topic_wizard-0.3.0.tar.gz
  • Upload date:
  • Size: 70.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-75-generic

File hashes

Hashes for topic_wizard-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c74f739ee1bcb9300934ecebd6d126c2f0c33ba75911a508bf94b2de6c58804a
MD5 8343af3b4888d0e2b1bd2cfe6a6703e7
BLAKE2b-256 66deaafb721b7f46b3704c1f3d311089b6b0a112bd97a4d51dce18945f6d77cf

See more details on using hashes here.

File details

Details for the file topic_wizard-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: topic_wizard-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 87.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-75-generic

File hashes

Hashes for topic_wizard-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b9c43134732f6f825ed6e01f0bc2014752f20cb552b160ed7b5bec9b63e8238
MD5 d27a5658e90f5e7ff90f8f8565e3c410
BLAKE2b-256 0737261d5d8c56600472307878516589f644fae4c4969b801804e6da3388133d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page