Skip to main content

Official implementation of TopicGPT: A Prompt-based Topic Modeling Framework (NAACL'24)

Project description

TopicGPT

arXiV Website

This repository contains scripts and prompts for our paper "TopicGPT: Topic Modeling by Prompting Large Language Models" (NAACL'24).

📣 Updates

  • [11/09/24] Python package topicgpt_python is released! You can install it via pip install topicgpt_python. We support OpenAI API, Vertex AI, and vLLM (requires GPUs for inference).
  • [11/18/23] Second-level topic generation code and refinement code are uploaded.
  • [11/11/23] Basic pipeline is uploaded. Refinement and second-level topic generation code are coming soon.

📦 Using TopicGPT

Getting Started

  1. Make a new Python 3.9+ environment using virtualenv or conda.
  2. Install the required packages:
pip install topicgpt_python
  • Set your API key:
export OPENAI_API_KEY={your_openai_api_key}
export VERTEX_PROJECT={your_vertex_project}
export VERTEX_LOCATION={your_vertex_location}
export HF_TOKEN={your_huggingface_token}

Data

  • Prepare your .jsonl data file in the following format:
    {
        "id": "IDs (optional)",
        "text": "Documents",
        "label": "Ground-truth labels (optional)"
    }
    
  • Put the data file in data/input. There is also a sample data file data/input/sample.jsonl to debug the code.
  • #TODO: fix - If you want to sample a subset of the data for topic generation, run python script/data.py --data <data_file> --num_samples 1000 --output <output_file>. This will sample 1000 documents from the data file and save it to <output_file>. You can also specify --num_samples to sample a different number of documents, see the paper for more detail.
  • Raw dataset used in the paper (Bills and Wiki): [link].

Pipeline

📜 Citation

@misc{pham2023topicgpt,
      title={TopicGPT: A Prompt-based Topic Modeling Framework}, 
      author={Chau Minh Pham and Alexander Hoyle and Simeng Sun and Mohit Iyyer},
      year={2023},
      eprint={2311.01449},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topicgpt_python-0.1.5.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

topicgpt_python-0.1.5-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file topicgpt_python-0.1.5.tar.gz.

File metadata

  • Download URL: topicgpt_python-0.1.5.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for topicgpt_python-0.1.5.tar.gz
Algorithm Hash digest
SHA256 5fe78069ecfaf928d233a25752ff02aecb85611e1f2a1cafe6469ca2887087d9
MD5 676bd10c45294d94b3aa009e7878dbc7
BLAKE2b-256 567709442b99880222ce9061b1585d960637acf65e967d00b7a83de77b748c8c

See more details on using hashes here.

File details

Details for the file topicgpt_python-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for topicgpt_python-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5b0e33e89bdb4f49d78cd293fe264cba3a4beef7e1dc73f81e237878b38e8489
MD5 42a5560478ed19c67b9e02a72b4a4532
BLAKE2b-256 ffa86233341fce2e095a9ad732ec5cbbf418cb1233c797621eb433832cae8ef1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page