Skip to main content

This is a package to generate topics for the text corpus.

Project description

TopicGPT package

How to install this package?

pip install wm_topicgpt==0.0.8

How to use this package?

Step 1: Set up global parameters

from topicgpt import config

# For NameFilter
config.azure_key = ""
config.azure_endpoint = ""

# For GPT3.5 or GPT4 or Ada-002
config.consumer_id = ""
config.private_key_path = ""
config.mso_llm_env = ""

Step 2: Load your dataset

Load your data, must be 'pandas.DataFrame' format.

import pandas as pd

data_df = pd.read_csv("dataset.csv")

Step 3: Run the code

# If using jupyter notebook, you should includes those two lines.
import nest_asyncio
nest_asyncio.apply()


# Setting up some params for this approach. If you don't need some parts, just drop that part.
params = {
    # preprocessing part
    'preprocessing': {'words_range': (1, 500)},
    # name filter part
    'name_filter': {},
    # extracting keywords part
    'extract_keywords': {'llm_model': 'gpt-35-turbo', 'temperature': 0., 'batch_size': 300},
    # embedding part (must have)
    'embedding': {'model': 'bge', 'batch_size': 500, 'device': 'mps'},
    # hdbscan clustering part (must have)
    'hdbscan': {'reduced_dim': 5, 'n_neighbors': 10, 'min_cluster_percent': 0.02, 'topk': 5,
                'llm_model': 'gpt-35-turbo', 'temperature': 0.5, 'verbose': True},
}

from topicgpt.pipeline import topic_modeling_by_hdbscan

# data_df: your pd.DataFrame dataset
# text_col_name: the column name of texts in the data_df
# params: some parameters for this approach
root = topic_modeling_by_hdbscan(data=data_df, text_col_name='userInput', params=params)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wm_topicgpt-0.0.8.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wm_topicgpt-0.0.8-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file wm_topicgpt-0.0.8.tar.gz.

File metadata

  • Download URL: wm_topicgpt-0.0.8.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for wm_topicgpt-0.0.8.tar.gz
Algorithm Hash digest
SHA256 8a3286fd55040e329d5e06f120d6c3435e1b2d56a7b91cecb4d01cc5ea2f9f3d
MD5 bb0ec305523856cfa9d79f6acee14875
BLAKE2b-256 03491798a005ed620e7d610c3edd3b3506405a38ad0f5f3e8d034e51e31ac55e

See more details on using hashes here.

File details

Details for the file wm_topicgpt-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: wm_topicgpt-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for wm_topicgpt-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8237a0e3c598091654327de278a12f9680f72301924bfa5de14735d019876d10
MD5 3fc3dd31ccf111c585a3c78a6fa2a2a0
BLAKE2b-256 2d6263505256b6ad75dea74c5df715af57097525abca732d109a1cee4f10d2ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page