No project description provided

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.7

Project description

tailwiz

tailwiz is an AI-powered tool for analyzing text. It has three main capabilties: classifying text (tailwiz.classify), parsing text given context and prompts (tailwiz.parse), and generating text given prompts (tailwiz.generate).

Quickstart

Install tailwiz by entering into command line:

python -m pip install --upgrade tailwiz

Then run the following in a Python environment for a quick example of text classification:

import tailwiz
import pandas as pd

# Create a pandas DataFrame of labeled text. The 'label'
# column contains 'mean' or 'nice' as labels for each text.
labeled_examples = pd.DataFrame(
    [
        ['You make me vomit', 'mean'],
        ['Love you lots', 'nice'],
        ['You are the best', 'nice'],
    ],
    columns=['text', 'label'],
)

# Create a pandas DataFrame of text to be classified by tailwiz.
# This DataFrame does not have a 'label' column. The labels here
# will be created by tailwiz.
to_classify = pd.DataFrame(
    ['Have a great day', 'I hate you'],
    columns=['text'],
)

# Classify text using labeled_examples as reference data.
results = tailwiz.classify(
    to_classify,
    labeled_examples=labeled_examples,
)

# The results are a copy of text with a new column populated
# with AI-generated labels.
print(results)

Installation

Install tailwiz through pip by entering the following into command line:

python -m pip install --upgrade tailwiz

Usage

In this section, we outline the three main functions of tailwiz and provide examples.

`tailwiz.classify(to_classify, labeled_examples, output_metrics=False, data_split_seed=None)`

Given text, classify the text.

Parameters:

to_classify : pandas.DataFrame with a column named 'text' (str). Text to be classified.
labeled_examples : pandas.DataFrame with columns named 'text' (str) and 'label' (str, int). Labeled examples to enhance the performance of the classification task. The classified text is in the 'text' column and the text's labels are in the 'label' column.
output_metrics : bool, default False. Whether to output performance_estimate together with results in a tuple.
data_split_seed : int, default None. Controls the shuffling of labeled_examples for internal training and evaluation of language models. Setting data_split_seed to be an integer ensures reproducible results.

Any additional keyword arguments will override tailwiz.classify's training arguments, specifically scikit-learn's LogisticRegression parameters.

Returns:

results : pandas.DataFrame. A copy of to_classify with a new column, 'tailwiz_label', containing classification results.
performance_estimate : Dict[str, float]. Dictionary of metric name to metric value mappings. Included together with results in a tuple if output_metrics is True. Uses labeled_examples to give an estimate of the accuracy of the classification.

Example:

import tailwiz
import pandas as pd

df_to_classify = pd.DataFrame(
    ['Have a great day', 'I hate you'],
    columns=['text'],
)
df_labeled_examples = pd.DataFrame(
    [
        ['You make me vomit', 'mean'],
        ['Love you lots', 'nice'],
        ['You are the best', 'nice'],
    ],
    columns=['text', 'label'],
)
results = tailwiz.classify(
    to_classify=df_to_classify,
    labeled_examples=df_labeled_examples,
)
print(results)

`tailwiz.parse(to_parse, labeled_examples=None, output_metrics=False, data_split_seed=None)`

Given a prompt and a context, parse the answer from the context.

Parameters:

to_parse : pandas.DataFrame with columns named 'context' (str) and 'prompt' (str). Labels will be parsed directly from contexts in 'context' according to the prompts in 'prompt'.
labeled_examples : pandas.DataFrame with columns named 'context' (str), 'prompt' (str), and 'label' (str), default None. Labeled examples to enhance the performance of the parsing task. The labels in 'label' must be extracted exactly from the contexts in 'context' (as whole words) according to the prompts in 'prompt'.
output_metrics : bool, default False. Whether to output performance_estimate together with results in a tuple.
data_split_seed : int, default None. Controls the shuffling of labeled_examples for internal training and evaluation of language models. Setting data_split_seed to be an integer ensures reproducible results.

Any additional keyword arguments will override tailwiz.parse's training arguments, specifically Hugging Face's TrainingArguments parameters.

Returns:

results : pandas.DataFrame. A copy of to_parse with a new column, 'tailwiz_label', containing parsed results.
performance_estimate : Dict[str, float]. Dictionary of metric name to metric value mappings. Included together with results in a tuple if output_metrics is True. Uses labeled_examples to give an estimate of the accuracy of the parsing job.

Example:

import tailwiz
import pandas as pd

df_to_parse = pd.DataFrame(
    [['Extract the money.', 'Try to save at least £10']],
    columns=['prompt', 'context'],
)
df_labeled_examples = pd.DataFrame(
    [
        ['Extract the money.', 'He owed me $100', '$100'],
        ['Extract the money.', '¥5000 bills are common', '¥5000'],
        ['Extract the money.', 'Eggs rose to €5 this week', '€5'],
    ],
    columns=['prompt', 'context', 'label'],
)
results = tailwiz.parse(
    to_parse=df_to_parse,
    labeled_examples=df_labeled_examples,
)
print(results)

`tailwiz.generate(to_generate, labeled_examples=None, output_metrics=False, data_split_seed=None)`

Given a prompt, generate an answer.

Parameters:

to_generate : pandas.DataFrame with a column named 'prompt' (str). Prompts according to which labels will generated.
labeled_examples : pandas.DataFrame with columns named 'prompt' (str) and 'label' (str), default None. Labeled examples to enhance the performance of the parsing task. The labels in 'label' should be responses to the prompts in 'prompt'.
output_metrics : bool, default False. Whether to output performance_estimate together with results in a tuple.
data_split_seed : int, default None. Controls the shuffling of labeled_examples for internal training and evaluation of language models. Setting data_split_seed to be an integer ensures reproducible results.

Any additional keyword arguments will override tailwiz.generate's training arguments, specifically Hugging Face's Seq2SeqTrainingArguments parameters.

Returns:

results : pandas.DataFrame. A copy of to_generate with a new column, 'tailwiz_label', containing generated results.
performance_estimate : Dict[str, float]. Dictionary of metric name to metric value mappings. Included together with results in a tuple if output_metrics is True. Uses labeled_examples to give an estimate of the accuracy of the text generation job.

Example:

import tailwiz
import pandas as pd

df_to_generate = pd.DataFrame(
    ['Label this sentence as "positive" or "negative": I am crying my eyes out.'],
    columns=['prompt']
)
df_labeled_examples = pd.DataFrame(
    [
        ['Label this sentence as "positive" or "negative": I love puppies!', 'positive'],
        ['Label this sentence as "positive" or "negative": I do not like you at all.', 'negative'],
        ['Label this sentence as "positive" or "negative": Love you lots.', 'positive'],
    ],
    columns=['prompt', 'label']
)
results = tailwiz.generate(
    to_generate=df_to_generate,
    labeled_examples=df_labeled_examples,
)
print(results)

Templates (Notebooks)

Use these Jupyter Notebook examples as templates to help load your data and run any of the three tailwiz functions:

For an example of tailwiz.classify, see examples/classify.ipynb
For an example of tailwiz.parse, see examples/parse.ipynb
For an example of tailwiz.generate, see examples/generate.ipynb

Contact

Please contact Daniel Kang (ddkang [at] g.illinois.edu) and Timothy Dai (timdai [at] stanford.edu) if you decide to use tailwiz.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.7

Release history Release notifications | RSS feed

This version

0.0.24

Jun 24, 2024

0.0.23

Jun 21, 2023

0.0.22

Jun 18, 2023

0.0.21

Jun 18, 2023

0.0.20

May 21, 2023

0.0.19

May 20, 2023

0.0.18

May 17, 2023

0.0.17

May 17, 2023

0.0.16

Apr 2, 2023

0.0.15

Feb 28, 2023

0.0.14

Feb 27, 2023

0.0.13

Feb 27, 2023

0.0.12

Feb 23, 2023

0.0.11

Feb 23, 2023

0.0.10

Feb 23, 2023

0.0.9

Feb 23, 2023

0.0.8

Feb 21, 2023

0.0.7

Feb 20, 2023

0.0.6

Feb 20, 2023

0.0.5

Feb 20, 2023

0.0.4

Feb 20, 2023

0.0.3

Feb 20, 2023

0.0.2

Feb 20, 2023

0.0.1

Feb 11, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tailwiz-0.0.24.tar.gz (12.7 kB view details)

Uploaded Jun 24, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tailwiz-0.0.24-py3-none-any.whl (13.8 kB view details)

Uploaded Jun 24, 2024 Python 3

File details

Details for the file tailwiz-0.0.24.tar.gz.

File metadata

Download URL: tailwiz-0.0.24.tar.gz
Upload date: Jun 24, 2024
Size: 12.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for tailwiz-0.0.24.tar.gz
Algorithm	Hash digest
SHA256	`85ffb328e188e04077fd9863af0c1f2c1f30c76c60fa0f17594296dd330908de`
MD5	`81b685654a0f08a5e31f2ed8b8193487`
BLAKE2b-256	`5a10d3537e86d56fffd08c90282bccf932860c12573fd9423eb78ee2327ff07d`

See more details on using hashes here.

File details

Details for the file tailwiz-0.0.24-py3-none-any.whl.

File metadata

Download URL: tailwiz-0.0.24-py3-none-any.whl
Upload date: Jun 24, 2024
Size: 13.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for tailwiz-0.0.24-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f4d22cb2d519071f7657d1b0c5e95134309eacd8fde2146b402289019a41e29`
MD5	`c5e4e1c2594ebd135eae9912c8199500`
BLAKE2b-256	`3dc7cc73921ad9fcb51d11bc943c3bfad35209ed29b1d7c9093ef6adf297f174`

See more details on using hashes here.

tailwiz 0.0.24

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tailwiz

Quickstart

Installation

Usage

tailwiz.classify(to_classify, labeled_examples, output_metrics=False, data_split_seed=None)

Parameters:

Returns:

Example:

tailwiz.parse(to_parse, labeled_examples=None, output_metrics=False, data_split_seed=None)

Parameters:

Returns:

Example:

tailwiz.generate(to_generate, labeled_examples=None, output_metrics=False, data_split_seed=None)

Parameters:

Returns:

Example:

Templates (Notebooks)

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tailwiz.classify(to_classify, labeled_examples, output_metrics=False, data_split_seed=None)`

`tailwiz.parse(to_parse, labeled_examples=None, output_metrics=False, data_split_seed=None)`

`tailwiz.generate(to_generate, labeled_examples=None, output_metrics=False, data_split_seed=None)`