FASTopic
Project description
FASTopic
FASTopic is a fast, adaptive, stable, and transferable topic modeling package. It leverages pretrained Transformers to produce document embeddings, and discovers latent topics through the optimal transport between document, topic, and word embeddings. This brings about a neat and efficient topic modeling paradigm, different from traditional probabilistic, VAE-based, and clustering-based models.
Installation
Install FASTopic with pip
:
pip install fastopic
Otherwise, install FASTopic from the source:
git clone https://github.com/bobxwu/FASTopic.git
cd FASTopic && python setup.py install
Quick Start
Discover topics from 20newsgroups.
from fastopic import FASTopic
from sklearn.datasets import fetch_20newsgroups
from topmost.preprocessing import Preprocessing
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
preprocessing = Preprocessing(vocab_size=10000)
model = FASTopic(50, preprocessing)
topic_top_words, doc_topic_dist = model.fit_transform(docs)
topic_top_words
is a list of the top words in discovered topics.
doc_topic_dist
is the topic distributions of documents (doc-topic distributions),
a numpy array with shape $N \times K$ (number of documents $N$ and number of topics $K$).
Usage
1. Try FASTopic on your dataset
from fastopic import FASTopic
from topmost.preprocessing import Preprocessing
# Prepare your dataset.
your_dataset = [
'doc 1',
'doc 2', # ...
]
# preprocess the dataset.
# This step tokenizes docs, removes stopwords, and sets max vocabulary size, etc..
# Pass your tokenizer as preprocessing = Preprocessing(vocab_size=5000, tokenizer=your_tokenizer)
preprocessing = Preprocessing(vocab_size=10000)
model = FASTopic(50, preprocessing)
topic_top_words, doc_topic_dist = model.fit_transform(docs)
2. Topic activity over time
After training, we can compute the activity of each topic at each time slice.
topic_activity = model.topic_activity_over_time(time_slices)
Citation
If you want to use our package, please cite as
@article{wu2024fastopic,
title={FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm},
author={Wu, Xiaobao and Nguyen, Thong and Zhang, Delvin Ce and Wang, William Yang and Luu, Anh Tuan},
journal={arXiv preprint arXiv:2405.17978},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for fastopic-0.0.1-1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6dc43087c7c3b79db1aff3d86471b64f442be3d77ca68a615299cff519f2793 |
|
MD5 | 97b61cf0223962c8304b331c8dd691d7 |
|
BLAKE2b-256 | 3f3b01c4d352b6e69a1d702562c1ad4c1dcc91a1eaca9395a58459fe1f8b0eea |