Skip to main content

Automatic topic labeling using LLMs

Project description

topic-autolabel

Documentation Status License Python Version Code style: black Ruff

Given text data, generates labels to classify the data into a set number of topics completely unsupervised.

Example usage:

First, install the package with pip: pip install topic_autolabel

# Labelling with supplied labels
from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')
candidate_labels = ["positive", "negative"]

# labelling column "review" with "positive" or "negative"
new_df = process_file(
    df=df,
    text_column="review",
    candidate_labels=candidate_labels,
    model_name="meta-llama/Llama-3.1-8B-Instruct" # default model to pull from huggingface hub
)

Alternatively, one can label text completely unsupervised by not providing the candidate_labels argument

from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')

# labelling column "review" with open-ended labels (best results when dataset talks about many topics)
new_df = process_file(
    df=df,
    text_column="review",
    model_name="meta-llama/Llama-3.1-8B-Instruct",
    num_labels=5 # generate up to 5 labels for each of the rows
)

Ollama integration:

Provided you have an ollama server running, you can pass in the tag of the model you want to use to generate labels.

from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')

# labelling column "review" with open-ended labels, using llama3.1 hosted with ollama (llama 3.1 must be running, run ollama ps to verify)
new_df = process_file(
    df=df,
    text_column="review",
    model_name="llama3.1",
    num_labels=5 # generate up to 5 labels for each of the rows
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_autolabel-0.1.6.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

topic_autolabel-0.1.6-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file topic_autolabel-0.1.6.tar.gz.

File metadata

  • Download URL: topic_autolabel-0.1.6.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for topic_autolabel-0.1.6.tar.gz
Algorithm Hash digest
SHA256 6136974635226ebe0fc4e91d5f8e19877f5fddd023d02418242b0c42226f506c
MD5 5d2f3949a8cea5d7713d32c1597fe669
BLAKE2b-256 128b02c2d6e4535faf486207d311b369b7adec551288616eaf26de364632d68e

See more details on using hashes here.

File details

Details for the file topic_autolabel-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for topic_autolabel-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5dfab0bff9d83e7da51c401109a7d5cf2c6f1da64e960c015ce09b42b8b633af
MD5 dadbd0a7ec3e8d155fef245510137a87
BLAKE2b-256 11438b5e3fc28db95dfcbb96cf448f51a6da5dca1a374064e3c5e4b0e7ceb9b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page