Skip to main content

No project description provided

Project description

Text Labeling AI Wizard (tailwiz)

tailwiz is an AI-powered tool for labeling text. It has three main capabilties: classifying text (tailwiz.classify), parsing text given context and prompts (tailwiz.parse), and generating text given prompts (tailwiz.generate).

Quickstart

Install tailwiz by entering into command line:

python -m pip install tailwiz

Then run the following in a Python environment for a quick example of text classification:

import tailwiz
import pandas as pd

# Create a pandas DataFrame of pre-labeled text. Notice the 'label'
# column contains 'mean' or 'nice' as labels for each text.
prelabeled_text = pd.DataFrame(
    [
        ['You make me vomit', 'mean'],
        ['Love you lots', 'nice'],
        ['You are the best', 'nice'],
    ],
    columns=['text', 'label'],
)

# Create a pandas DataFrame of text to be labeled. Notice that this
# DataFrame does not have a 'label' column. The labels here will be
# created by tailwiz.
text_to_label = pd.DataFrame(
    ['Have a great day', 'I hate you'],
    columns=['text'],
)

# Classify text_to_label using prelabeled_text as reference data.
results = tailwiz.classify(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)

# Note how the results are a copy of text_to_label with a new column
# populated with AI-generated labels.
print(results)

Installation

Install tailwiz through pip:

python -m pip install tailwiz

Usage

In this section, we outline the three main functions of tailwiz and provide examples.

tailwiz.classify(text_to_label, prelabeled_text=None, output_metrics=False)

Given text, classify the text.

Parameters:

  • text_to_label : pandas.DataFrame with a column named 'text' (str). Text to be classified.
  • prelabeled_text : pandas.DataFrame with columns named 'text' (str) and 'label' (str, int), default None. Pre-labeled text to enhance the performance of the classification task. The classified text is in the 'text' column and the text's labels are in the 'label' column.
  • output_metrics : bool, default False. Whether to output performance_estimate together with results in a tuple.

Returns:

  • results : pandas.DataFrame. A copy of text_to_label with a new column, 'label_from_tailwiz', containing classification results.
  • performance_estimate : Dict[str, float]. Dictionary of metric name to metric value mappings. Included together with results in a tuple if output_metrics is True. Uses prelabeled_text to give an estimate of the accuracy of the classification. One vs. all metrics are given for multiclass classification.

Example:

import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['You make me vomit', 'mean'],
        ['Love you lots', 'nice'],
        ['You are the best', 'nice'],
    ],
    columns=['text', 'label'],
)
text_to_label = pd.DataFrame(
    ['Have a great day', 'I hate you'],
    columns=['text'],
)
results = tailwiz.classify(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
print(results)

tailwiz.parse(text_to_label, prelabeled_text=None, output_metrics=False)

Given a prompt and a context, parse the answer from the context.

Parameters:

  • text_to_label : pandas.DataFrame with columns named 'context' (str) and 'prompt' (str). Labels will be parsed directly from contexts in 'context' according to the prompts in 'prompt'.
  • prelabeled_text : pandas.DataFrame with columns named 'context' (str), 'prompt' (str), and 'label' (str), default None. Pre-labeled text to enhance the performance of the parsing task. The labels in 'label' must be extracted exactly from the contexts in 'context' (as whole words) according to the prompts in 'prompt'.
  • output_metrics : bool, default False. Whether to output performance_estimate together with results in a tuple.

Returns:

  • results : pandas.DataFrame. A copy of text_to_label with a new column, 'label_from_tailwiz', containing parsed results.
  • performance_estimate : Dict[str, float]. Dictionary of metric name to metric value mappings. Included together with results in a tuple if output_metrics is True. Uses prelabeled_text to give an estimate of the accuracy of the parsing job.

Example:

import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['Extract the money.', 'He owed me $100', '$100'],
        ['Extract the money.', '¥5000 bills are common', '¥5000'],
        ['Extract the money.', 'Eggs rose to €5 this week', '€5'],
    ],
    columns=['prompt', 'context', 'label'],
)
text_to_label = pd.DataFrame(
    [['Extract the money.', 'Try to save at least £10']],
    columns=['prompt', 'context'],
)
results = tailwiz.parse(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
print(results)

tailwiz.generate(text_to_label, prelabeled_text=None, output_metrics=False)

Given a prompt, generate an answer.

Parameters:

  • text_to_label : pandas.DataFrame with a column named 'prompt' (str). Prompts according to which labels will generated.
  • prelabeled_text : pandas.DataFrame with columns named 'prompt' (str) and 'label' (str), default None. Pre-labeled text to enhance the performance of the parsing task. The labels in 'label' should be responses to the prompts in 'prompt'.
  • output_metrics : bool, default False. Whether to output performance_estimate together with results in a tuple.

Returns:

  • results : pandas.DataFrame. A copy of text_to_label with a new column, 'label_from_tailwiz', containing generated results.
  • performance_estimate : Dict[str, float]. Dictionary of metric name to metric value mappings. Included together with results in a tuple if output_metrics is True. Uses prelabeled_text to give an estimate of the accuracy of the text generation job.

Example:

import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['Label this sentence as "positive" or "negative": I love puppies!', 'positive'],
        ['Label this sentence as "positive" or "negative": I do not like you at all.', 'negative'],
        ['Label this sentence as "positive" or "negative": Love you lots.', 'positive'],
    ],
    columns=['prompt', 'label']
)
text_to_label = pd.DataFrame(
    ['Label this sentence as "positive" or "negative": I am crying my eyes out.'],
    columns=['prompt']
)
results = tailwiz.generate(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
print(results)

Templates (Notebooks)

Use these Jupyter Notebook examples as templates to help load your data and run any of the three tailwiz functions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tailwiz-0.0.10.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

tailwiz-0.0.10-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file tailwiz-0.0.10.tar.gz.

File metadata

  • Download URL: tailwiz-0.0.10.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for tailwiz-0.0.10.tar.gz
Algorithm Hash digest
SHA256 1d985f94c843c5a038a5252f08f1e9425f272b78418cd8a5c182e3a49af496d8
MD5 669a296e1d3c1a04c151d472bd3f431b
BLAKE2b-256 5e6fcea634280190b6f7687c9e266288303a0db5f35cfbb1dee2db18725397d8

See more details on using hashes here.

File details

Details for the file tailwiz-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: tailwiz-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for tailwiz-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 9dbe4c841834292527d01f035b0e1aee9d850ff97ec14e51b2246fa933ce5cf7
MD5 0f62d537f7c83b6ff78f44c81aae18fd
BLAKE2b-256 bd2faeb72f8cf9c2b70bf912691740c2fe7c72100ce0117612dfb29592657a65

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page