Official implementation of TopicGPT: A Prompt-based Topic Modeling Framework (NAACL'24)
Project description
TopicGPT
This repository contains scripts and prompts for our paper "TopicGPT: Topic Modeling by Prompting Large Language Models" (NAACL'24).
📣 Updates
- [11/09/24] Python package
topicgpt_python
is released! You can install it viapip install topicgpt_python
. We support OpenAI API, Vertex AI, and vLLM (requires GPUs for inference). - [11/18/23] Second-level topic generation code and refinement code are uploaded.
- [11/11/23] Basic pipeline is uploaded. Refinement and second-level topic generation code are coming soon.
📦 Using TopicGPT
Getting Started
- Make a new Python 3.9+ environment using virtualenv or conda.
- Install the required packages:
pip install topicgpt_python
- Set your API key:
export OPENAI_API_KEY={your_openai_api_key}
export VERTEX_PROJECT={your_vertex_project}
export VERTEX_LOCATION={your_vertex_location}
export HF_TOKEN={your_huggingface_token}
- Refer to https://openai.com/pricing/ for OpenAI API pricing or to https://cloud.google.com/vertex-ai/pricing for Vertex API pricing.
Data
- Prepare your
.jsonl
data file in the following format:{ "id": "IDs (optional)", "text": "Documents", "label": "Ground-truth labels (optional)" }
- Put the data file in
data/input
. There is also a sample data filedata/input/sample.jsonl
to debug the code. - #TODO: fix - If you want to sample a subset of the data for topic generation, run
python script/data.py --data <data_file> --num_samples 1000 --output <output_file>
. This will sample 1000 documents from the data file and save it to<output_file>
. You can also specify--num_samples
to sample a different number of documents, see the paper for more detail. - Raw dataset used in the paper (Bills and Wiki): [link].
Pipeline
📜 Citation
@misc{pham2023topicgpt,
title={TopicGPT: A Prompt-based Topic Modeling Framework},
author={Chau Minh Pham and Alexander Hoyle and Simeng Sun and Mohit Iyyer},
year={2023},
eprint={2311.01449},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
topicgpt_python-0.1.5.tar.gz
(22.0 kB
view details)
Built Distribution
File details
Details for the file topicgpt_python-0.1.5.tar.gz
.
File metadata
- Download URL: topicgpt_python-0.1.5.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fe78069ecfaf928d233a25752ff02aecb85611e1f2a1cafe6469ca2887087d9 |
|
MD5 | 676bd10c45294d94b3aa009e7878dbc7 |
|
BLAKE2b-256 | 567709442b99880222ce9061b1585d960637acf65e967d00b7a83de77b748c8c |
File details
Details for the file topicgpt_python-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: topicgpt_python-0.1.5-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b0e33e89bdb4f49d78cd293fe264cba3a4beef7e1dc73f81e237878b38e8489 |
|
MD5 | 42a5560478ed19c67b9e02a72b4a4532 |
|
BLAKE2b-256 | ffa86233341fce2e095a9ad732ec5cbbf418cb1233c797621eb433832cae8ef1 |