Skip to main content

Pretty and opinionated topic model visualization in Python.

Project description

topicwizard


Pretty and opinionated topic model visualization in Python.

Open in Colab PyPI version pip downloads python version Code style: black

https://user-images.githubusercontent.com/13087737/234209888-0d20ede9-2ea1-4d6e-b69b-71b863287cc9.mp4

New in version 0.3.1 🌟 🌟

  • You can now investigate relations of pre-existing labels to your topics and words :mag:

New in version 0.3.0 🌟

  • Exclude pages, that are not needed :bird:
  • Self-contained interactive figures :gift:
  • Topic name inference is now default behavior and is done implicitly.

Features

  • Investigate complex relations between topics, words, documents and groups/genres/labels
  • Sklearn, Gensim and BERTopic compatible :nut_and_bolt:
  • Highly interactive web app
  • Interactive and composable Plotly figures
  • Automatically infer topic names, oooor...
  • Name topics manually
  • Easy deployment :earth_africa:

Installation

Install from PyPI:

pip install topic-wizard

Usage (documentation)

Step 0:

Have a corpus ready for analysis, in this example I am going to use 20 newgroups from scikit-learn.

from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data

# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = [newsgroups.target_names[label] for label in newsgroups.target]

Step 1:

Train a scikit-learn compatible topic model. (If you want to use non-scikit-learn topic models, check compatibility)

from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

# Create topic pipeline
pipeline = make_pipeline(
    CountVectorizer(stop_words="english", min_df=10),
    NMF(n_components=30),
)

# Then fit it on the given texts
pipeline.fit(corpus)

Step 2a:

Visualize with the topicwizard webapp :bulb:

import topicwizard

topicwizard.visualize(corpus, pipeline=pipeline)

From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:

# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(corpus, pipeline=pipeline, exclude_pages=["documents"])

topics screenshot words screenshot words screenshot documents screenshot

From version 0.3.1 you can investigate groups/labels by passing them along to the webapp.

topicwizard.visualize(corpus, pipeline=pipeline, group_labels=group_labels)

groups screenshot

Ooooor...

Step 2b:

Produce high quality self-contained HTML plots and create your own dashboards/reports :strawberry:

Map of words

from topicwizard.figures import word_map

word_map(corpus, pipeline=pipeline)

word map screenshot

Timelines of topic distributions

from topicwizard.figures import document_topic_timeline

document_topic_timeline(
    "Joe Biden takes over presidential office from Donald Trump.",
    pipeline=pipeline,
)

document timeline

Wordclouds of your topics :cloud:

from topicwizard.figures import topic_wordclouds

topic_wordclouds(corpus, pipeline=pipeline)

wordclouds

And much more... (documentation)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_wizard-0.3.1.tar.gz (74.4 kB view details)

Uploaded Source

Built Distribution

topic_wizard-0.3.1-py3-none-any.whl (95.1 kB view details)

Uploaded Python 3

File details

Details for the file topic_wizard-0.3.1.tar.gz.

File metadata

  • Download URL: topic_wizard-0.3.1.tar.gz
  • Upload date:
  • Size: 74.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-76-generic

File hashes

Hashes for topic_wizard-0.3.1.tar.gz
Algorithm Hash digest
SHA256 090a90bf23cc248f162c21c63fba0e312eabbbd60b934c9c0f29bb77b2b2ff6d
MD5 5922c875d8b3dd5f687b97fc58669ac6
BLAKE2b-256 5cdcc7017b426093ef275457fd72e3ee9e04a26c4880655fd1f94dc228281c5c

See more details on using hashes here.

File details

Details for the file topic_wizard-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: topic_wizard-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 95.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.8 Linux/5.15.0-76-generic

File hashes

Hashes for topic_wizard-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2000be19bf4d37b0f2103493bf7551b1243c6a6232a4916448563c2bb74bf75d
MD5 1e66b7b68356194caf4e807e62b439de
BLAKE2b-256 c6b2ca1b09b226779e890bf6db4cb4e6d6b40f332c0508989fd7dc015b8d7c8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page