Skip to main content

Completion After Prompt Probability. Make your LLM make a choice

Project description

CAPPr: Completion After Prompt Probability

Python 3.8+ tests codecov PyPI - Package Version License

CAPPr performs text classification. No training. No post-processing.
Just have your LLM pick from a list of choices.
Or compute the probability of a completion given a prompt.
Squeeze more out of open source LLMs.

Usage

Use a GGUF model

This model must be able to be loaded using llama_cpp.Llama.

from llama_cpp import Llama
from cappr.llama_cpp.classify import predict

# Load model
model = Llama("./TinyLLama-v0.Q8_0.gguf", verbose=False)

prompt = """Gary told Spongebob a story:
There once was a man from Peru; who dreamed he was eating his shoe. He
woke with a fright, in the middle of the night, to find that his dream
had come true.

The moral of the story is to"""

completions = (
  "look at the bright side",
  "use your imagination",
  "eat shoes",
)

pred = predict(prompt, completions, model)
print(pred)
# use your imagination

Notice that a completion can contain many tokens. CAPPr is 100% guaranteed to return an output from the list of possible answers.

See this page of the documentation for more info on using GGUF models.

Use a HuggingFace AutoModelForCausalLM

This model must be able to be loaded using transformers.AutoModelForCausalLM.from_pretrained.

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict

# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Which planet is closer to the Sun: Mercury or Earth?"
completions = ("Mercury", "Earth")

pred = predict(prompt, completions, model_and_tokenizer=(model, tokenizer))
print(pred)
# Mercury

See this page of the documentation for more info on using transformers models.

Use an AutoGPTQ model

cappr.huggingface is compatible with models loaded via auto_gptq.AutoGPTQForCausalLM.from_quantized. See this notebook for a minimal demo.

Note that for transformers>=4.32.0, you can load GPTQ models using transformers.AutoModelForCausalLM.

See this page of the documentation for more info on using these models.

Use an AutoAWQ model

cappr.huggingface.classify_no_cache is compatible with models loaded via awq.AutoAWQForCausalLM.from_quantized. See this notebook for a minimal demo.

Note that for transformers>=4.35.0, you can load AWQ models using transformers.AutoModelForCausalLM. AWQ models loaded this way are compatible with cappr.huggingface.classify, which is usually faster.

See this page of the documentation for more info on using these models.

Use a model from the OpenAI API

This model must be compatible with the /v1/completions endpoint (excluding gpt-3.5-turbo-instruct).

from cappr.openai.classify import predict

prompt = """
Tweet about a movie: "Oppenheimer was pretty good. But 3 hrs...cmon Nolan."
This tweet contains the following criticism:
""".strip("\n")

completions = ("bad message", "too long", "unfunny")

pred = predict(prompt, completions, model="text-ada-001")
print(pred)
# too long

See this page of the documentation for more info on using OpenAI models.

Extract the final answer from a step-by-step completion

Step-by-step and chain-of-thought prompts are highly effective ways to get an LLM to "reason" about more complex tasks. But if you need a structured output, a step-by-step completion is unwieldy. Use CAPPr to extract the final answer from these types of completions, given a list of possible answers.

See this idea in action here in the documentation.

Run in batches

Let's use a PyTorch transformers model. Also, let's predict probabilities instead of the class.

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba

# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompts = [
    "Stephen Curry is a",
    "Martina Navratilova was a",
    "Dexter, from the TV Series Dexter's Laboratory, is a",
    "LeBron James is a",
]

# Each of the prompts could be completed with one of these:
class_names = ("basketball player", "tennis player", "scientist")
prior =       (      1/6,                1/6,            2/3    )
# Say I expect most of my data to have scientists

# Run CAPPr
pred_probs = predict_proba(
    prompts=prompts,
    completions=class_names,
    model_and_tokenizer=(model, tokenizer),
    batch_size=32,  # whatever fits on your CPU/GPU
    prior=prior,
)

# pred_probs[i,j] = probability that prompts[i] is classified as class_names[j]
print(pred_probs.round(1))
# [[0.5 0.3 0.2]
#  [0.3 0.6 0.2]
#  [0.1 0.1 0.8]
#  [0.8 0.2 0. ]]

# For each prompt, which completion is most likely?
pred_class_idxs = pred_probs.argmax(axis=-1)
preds = [class_names[pred_class_idx] for pred_class_idx in pred_class_idxs]
print(preds)
# ['basketball player',
#  'tennis player',
#  'scientist',
#  'basketball player']
Run in batches, where each prompt has a different set of possible completions

Again, let's use a PyTorch transformers model to predict probabilities.

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba_examples
from cappr import Example

# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a sequence of Example objects representing your classification tasks
examples = [
    Example(
        prompt="Jodie Foster played",
        completions=("Clarice Starling", "Trinity in The Matrix"),
    ),
    Example(
        prompt="Batman, from Batman: The Animated Series, was played by",
        completions=("Pete Holmes", "Kevin Conroy", "Spongebob!"),
        prior=      (     1/3      ,      2/3     ,      0      ),
    ),
]

# Run CAPPr
pred_probs = predict_proba_examples(
    examples, model_and_tokenizer=(model, tokenizer)
)

# pred_probs[i][j] = probability that examples[i].prompt is classified as
# examples[i].completions[j]
print([example_pred_probs.round(2) for example_pred_probs in pred_probs])
# [array([0.7, 0.3]),
#  array([0.03, 0.97, 0.  ])]

# For each example, which completion is most likely?
pred_class_idxs = [
    example_pred_probs.argmax() for example_pred_probs in pred_probs
]
preds = [
    example.completions[pred_class_idx]
    for example, pred_class_idx in zip(examples, pred_class_idxs)
]
print(preds)
# ['Clarice Starling',
#  'Kevin Conroy']

See the demos for demonstrations of slightly harder classification tasks.

For CAPPr, GPTQ models are the most computationally performant. These models are compatible with cappr.huggingface.classify. See this page of the documentation for more info on using these models.

Documentation

https://cappr.readthedocs.io

Installation

See this page of the documentation.

Motivation

Reduce engineering complexity.

See this page of the documentation for more info.

Performance

Statistical performance

For open source models, see

In general, you should expect similar or identical performance to text generation when every completion is 1 token long.

See this page of the documentation for some discussion.

For OpenAI models (some deprecated), see

2 SuperGLUE datasets

RAFT zero-shot training sets

Computational performance

See this page of the documentation.

How it works

You input a prompt string, a end_of_prompt string (a whitespace or empty) and a set of candidate completion strings such that the string—

{prompt}{end_of_prompt}{completion}

—is a naturally flowing thought. CAPPr picks the completion which is mostly likely to follow prompt by computing the:

Completion
After
Prompt
Probability

The method is fleshed out in my question on Cross Validated.

Related work

See this page of the documentation.

Local development

See this page of the documentation.

Todo

I'm dumping todos here:

Code changes

Reseach experiments

Feel free to raise issues ofc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cappr-0.9.0.tar.gz (1.6 MB view hashes)

Uploaded Source

Built Distribution

cappr-0.9.0-py3-none-any.whl (60.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page