This is a package to generate topics for the text corpus.
Project description
TopicGPT package
How to install this package?
pip install wm_topicgpt==0.0.8
How to use this package?
Step 1: Set up global parameters
from topicgpt import config
# For NameFilter
config.azure_key = ""
config.azure_endpoint = ""
# For GPT3.5 or GPT4 or Ada-002
config.consumer_id = ""
config.private_key_path = ""
config.mso_llm_env = ""
Step 2: Load your dataset
Load your data, must be 'pandas.DataFrame' format.
import pandas as pd
data_df = pd.read_csv("dataset.csv")
Step 3: Run the code
# If using jupyter notebook, you should includes those two lines.
import nest_asyncio
nest_asyncio.apply()
# Setting up some params for this approach. If you don't need some parts, just drop that part.
params = {
# preprocessing part
'preprocessing': {'words_range': (1, 500)},
# name filter part
'name_filter': {},
# extracting keywords part
'extract_keywords': {'llm_model': 'gpt-35-turbo', 'temperature': 0., 'batch_size': 300},
# embedding part (must have)
'embedding': {'model': 'bge', 'batch_size': 500, 'device': 'mps'},
# hdbscan clustering part (must have)
'hdbscan': {'reduced_dim': 5, 'n_neighbors': 10, 'min_cluster_percent': 0.02, 'topk': 5,
'llm_model': 'gpt-35-turbo', 'temperature': 0.5, 'verbose': True},
}
from topicgpt.pipeline import topic_modeling_by_hdbscan
# data_df: your pd.DataFrame dataset
# text_col_name: the column name of texts in the data_df
# params: some parameters for this approach
root = topic_modeling_by_hdbscan(data=data_df, text_col_name='userInput', params=params)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wm_topicgpt-0.0.8.tar.gz
(20.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wm_topicgpt-0.0.8.tar.gz.
File metadata
- Download URL: wm_topicgpt-0.0.8.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a3286fd55040e329d5e06f120d6c3435e1b2d56a7b91cecb4d01cc5ea2f9f3d
|
|
| MD5 |
bb0ec305523856cfa9d79f6acee14875
|
|
| BLAKE2b-256 |
03491798a005ed620e7d610c3edd3b3506405a38ad0f5f3e8d034e51e31ac55e
|
File details
Details for the file wm_topicgpt-0.0.8-py3-none-any.whl.
File metadata
- Download URL: wm_topicgpt-0.0.8-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8237a0e3c598091654327de278a12f9680f72301924bfa5de14735d019876d10
|
|
| MD5 |
3fc3dd31ccf111c585a3c78a6fa2a2a0
|
|
| BLAKE2b-256 |
2d6263505256b6ad75dea74c5df715af57097525abca732d109a1cee4f10d2ca
|