Automatic topic labeling using LLMs
Project description
topic-autolabel
Given text data, generates labels to classify the data into a set number of topics completely unsupervised.
Example usage:
First, install the package with pip: pip install topic_autolabel
# Labelling with supplied labels
from topic_autolabel import process_file
import pandas as pd
df = pd.read_csv('path/to/file')
candidate_labels = ["positive", "negative"]
# labelling column "review" with "positive" or "negative"
new_df = process_file(
df=df,
text_column="review",
candidate_labels=candidate_labels,
model_name="meta-llama/Llama-3.1-8B-Instruct" # default model to pull from huggingface hub
)
Alternatively, one can label text completely unsupervised by not providing the candidate_labels
argument
from topic_autolabel import process_file
import pandas as pd
df = pd.read_csv('path/to/file')
# labelling column "review" with open-ended labels (best results when dataset talks about many topics)
new_df = process_file(
df=df,
text_column="review",
model_name="meta-llama/Llama-3.1-8B-Instruct",
num_labels=5 # generate up to 5 labels for each of the rows
)
Ollama integration:
Provided you have an ollama server running, you can pass in the tag of the model you want to use to generate labels.
from topic_autolabel import process_file
import pandas as pd
df = pd.read_csv('path/to/file')
# labelling column "review" with open-ended labels, using llama3.1 hosted with ollama (llama 3.1 must be running, run ollama ps to verify)
new_df = process_file(
df=df,
text_column="review",
model_name="llama3.1",
num_labels=5 # generate up to 5 labels for each of the rows
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
topic_autolabel-0.1.6.tar.gz
(13.1 kB
view details)
Built Distribution
File details
Details for the file topic_autolabel-0.1.6.tar.gz
.
File metadata
- Download URL: topic_autolabel-0.1.6.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
6136974635226ebe0fc4e91d5f8e19877f5fddd023d02418242b0c42226f506c
|
|
MD5 |
5d2f3949a8cea5d7713d32c1597fe669
|
|
BLAKE2b-256 |
128b02c2d6e4535faf486207d311b369b7adec551288616eaf26de364632d68e
|
File details
Details for the file topic_autolabel-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: topic_autolabel-0.1.6-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
5dfab0bff9d83e7da51c401109a7d5cf2c6f1da64e960c015ce09b42b8b633af
|
|
MD5 |
dadbd0a7ec3e8d155fef245510137a87
|
|
BLAKE2b-256 |
11438b5e3fc28db95dfcbb96cf448f51a6da5dca1a374064e3c5e4b0e7ceb9b7
|