Get your LLM to pick the right category.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

CAPPr: Completion After Prompt Probability

CAPPr performs text classification. No training. No post-processing. Just have your LLM pick from a list of choices. Or compute the probability of a completion given a prompt. Squeeze more out of open source LLMs.

Usage

Use a model from the OpenAI API

Specifically, this model must be compatible with the /v1/completions endpoint.

from cappr.openai.classify import predict

prompt = """
Tweet about a movie: "Oppenheimer was pretty good. But 3 hrs...cmon Nolan."
This tweet contains the following criticism:
""".strip("\n")

completions = ("bad message", "too long", "unfunny")

pred = predict(prompt, completions, model="text-ada-001")
print(pred)
# too long

Notice that a completion can contain many tokens.

See this page of the documentation for more info on using OpenAI models.

Extract the final answer from a step-by-step completion

Step-by-step and chain-of-thought prompts are highly effective ways to get an LLM to "reason" about more complex tasks. But if you need a structured output, a step-by-step completion is unwieldy. Use CAPPr to extract the final answer from these types of completions, given a list of possible answers.

See this idea in action here in the documentation. CAPPr is 100% guaranteed to return an output from the list of answers.

Use a PyTorch transformers model

Specifically, this model must be able to be loaded using transformers.AutoModelForCausalLM.from_pretrained.

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict

# Load a model and its corresponding tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Which planet is closer to the Sun: Mercury or Earth?"
completions = ("Mercury", "Earth")

pred = predict(prompt, completions, model_and_tokenizer=(model, tokenizer))
print(pred)
# Mercury

See this page of the documentation for more info on using PyTorch transformers models.

Use a GGUF model

Specifically, this model must be able to be loaded using llama_cpp.Llama.

from llama_cpp import Llama
from cappr.llama_cpp.classify import predict

# Load model. Always set logits_all=True for CAPPr
model = Llama("./TinyLLama-v0.Q8_0.gguf", logits_all=True, verbose=False)

prompt = """Gary told Spongebob a story:
There once was a man from Peru; who dreamed he was eating his shoe. He
woke with a fright, in the middle of the night, to find that his dream
had come true.

The moral of the story is to"""
completions = (
  "look at the bright side",
  "use your imagination",
  "eat shoes",
)

pred = predict(prompt, completions, model)
print(pred)
# use your imagination

See this page of the documentation for more info on using GGUF models.

Use an AutoGPTQ model

cappr.huggingface seems to play nice with models loaded via auto_gptq.AutoGPTQForCausalLM.from_quantized. But I haven't thoroughly tested that. See this notebook for a minimal demo.

Use an AutoAWQ model

cappr.huggingface.classify_no_cache seems to play nice with models loaded via awq.AutoAWQForCausalLM.from_quantized. But I haven't thoroughly tested that. See this notebook for a minimal demo.

Run in batches

Let's use a PyTorch transformers model. Also, let's predict probabilities instead of the class.

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba

# Load a model and its corresponding tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompts = [
    "Stephen Curry is a",
    "Martina Navratilova was a",
    "Dexter, from the TV Series Dexter's Laboratory, is a",
    "LeBron James is a",
]

# Each of the prompts could be completed with one of these:
class_names = ("basketball player", "tennis player", "scientist")
prior =       (      1/6,                1/6,            2/3    )
# Say I expect most of my data to have scientists

# Run CAPPr
pred_probs = predict_proba(
    prompts=prompts,
    completions=class_names,
    model_and_tokenizer=(model, tokenizer),
    batch_size=32,  # whatever fits on your CPU/GPU
    prior=prior,
)

# pred_probs[i,j] = probability that prompts[i] is classified as class_names[j]
print(pred_probs.round(1))
# [[0.5 0.3 0.2]
#  [0.3 0.6 0.2]
#  [0.1 0.1 0.8]
#  [0.8 0.2 0. ]]

# For each prompt, which completion is most likely?
pred_class_idxs = pred_probs.argmax(axis=-1)
preds = [class_names[pred_class_idx] for pred_class_idx in pred_class_idxs]
print(preds)
# ['basketball player',
#  'tennis player',
#  'scientist',
#  'basketball player']

Run in batches, where each prompt has a different set of possible completions

Again, let's use a PyTorch transformers model to predict probabilities.

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba_examples
from cappr import Example

# Load a model and its corresponding tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a sequence of Example objects representing your classification tasks
examples = [
    Example(
        prompt="Jodie Foster played",
        completions=("Clarice Starling", "Trinity in The Matrix"),
    ),
    Example(
        prompt="Batman, from Batman: The Animated Series, was played by",
        completions=("Pete Holmes", "Kevin Conroy", "Spongebob!"),
        prior=      (     1/3      ,      2/3     ,      0      ),
    ),
]

# Run CAPPr
pred_probs = predict_proba_examples(examples, model_and_tokenizer=(model, tokenizer))

# pred_probs[i][j] = probability that examples[i].prompt is classified as
# examples[i].completions[j]
print([example_pred_probs.round(2) for example_pred_probs in pred_probs])
# [array([0.7, 0.3]),
#  array([0.03, 0.97, 0.  ])]

# For each example, which completion is most likely?
pred_class_idxs = [example_pred_probs.argmax() for example_pred_probs in pred_probs]
preds = [
    example.completions[pred_class_idx]
    for example, pred_class_idx in zip(examples, pred_class_idxs)
]
print(preds)
# ['Clarice Starling',
#  'Kevin Conroy']

See demos/llama_cpp/superglue/copa.ipynb for a demonstration of a slightly harder classification task.

Documentation

https://cappr.readthedocs.io

Installation

See this page of the documentation.

Motivation

Minimize engineering complexity.

See this page of the documentation for more info.

Cool

A handful of experiments suggest that CAPPr squeezes more out of smaller LLMs. See this page of the documentation.

Honest

am bored. am unemployed.

Performance

Statistical performance

I'm still evaluating open source models. For now, see

the 4-bit 4 GB Llama 2 COPA demo
and this (minimal but surprising) 3 GB StableLM demo.

For OpenAI models, see

2 SuperGLUE datasets

RAFT zero-shot training sets

TODO: summary tables/spiderwebs

Computational performance

See this page of the documentation.

How it works

You input a prompt string, a end_of_prompt string (a whitespace or empty) and a set of candidate completion strings such that the string—

{prompt}{end_of_prompt}{completion}

—is a naturally flowing thought. CAPPr picks the completion which is mostly likely to follow prompt by computing the:

Completion
After
Prompt
Probability

The method is fleshed out in my question on Cross Validated.

Related work

See this page of the documentation.

Local development

Setup

Create a new Python 3.8+ virtual environment. Activate the venv. I use virtualenvwrapper. For this example, let's create a virtual environment called cappr using Python's native venv:
```
cd your/venvs

python3 -m venv cappr

source cappr/bin/activate

python -m pip install wheel --upgrade pip
```
cd to wherever you store projects, and clone the repo (or fork it and clone that) there
```
cd your/projects

git clone https://github.com/kddubey/cappr.git
```
cd to the repo and install this package in editable mode, along with development requirements (after ensuring that your venv is activated!)
```
cd cappr

python -m pip install -e ".[dev]"
```

Download the tiny GGUF Llama model I uploaded to HF

huggingface-cli download \
aladar/TinyLLama-v0-GGUF \
TinyLLama-v0.Q8_0.gguf \
--local-dir ./tests/llama_cpp/fixtures/models \
--local-dir-use-symlinks False

VS code extensions for development

autoDocstring. Use the numpy format, and check "Start On New Line".
Set Python formatting to black.
Rewrap. Enable Auto Wrap.

And set the vertical line ruler to 88.

Testing

From the repo home directory cappr:

pytest

Note that a few small transformers and tokenizers will be downloaded to your computer.

Sometimes I get worried about bigger code changes. So consider additionally testing statistical performance by running an appropriate demo in demos.

When you add a new testing module, add it to the list in pyproject.toml. The list is in order of dependencies: Example and utils must pass for the rest of the modules to pass.

To test a specific module, e.g., huggingface:

pytest -k huggingface

Docs

To test changes to documentation, first locally build them from the repo home directory cappr via

cd docs

make html

and then preview them by opening docs/build/html/index.html in your browser.

After merging code to main, the official docs will be automatically built and published.

Release

Bump the version, and then create a new release on GitHub. A new version of the package will then be automatically published on PyPI.

Todo

I'm dumping TODOs here:

Code changes

Reseach experiments

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.9.0

Feb 1, 2024

0.8.8

Dec 2, 2023

0.8.7

Nov 23, 2023

0.8.6

Nov 22, 2023

0.8.5

Nov 20, 2023

0.8.4

Nov 12, 2023

0.8.3

Nov 9, 2023

0.8.2

Nov 7, 2023

0.8.1 yanked

Nov 7, 2023

Reason this release was yanked:

Doesn't work for _examples functions

0.8.0

Nov 5, 2023

0.7.0

Nov 2, 2023

0.6.6

Oct 29, 2023

0.6.5

Oct 28, 2023

0.6.4

Oct 27, 2023

0.6.3

Oct 23, 2023

0.6.2

Oct 18, 2023

This version

0.6.1

Oct 13, 2023

0.6.0

Oct 12, 2023

0.5.1

Oct 10, 2023

0.5.0

Oct 9, 2023

0.4.7

Oct 5, 2023

0.4.6

Sep 30, 2023

0.4.5

Sep 29, 2023

0.4.0

Sep 7, 2023

0.3.0

Sep 6, 2023

0.2.6

May 19, 2023

0.2.5

May 5, 2023

0.2.4

Apr 28, 2023

0.2.3

Apr 27, 2023

0.2.2

Apr 19, 2023

0.2.1

Apr 4, 2023

0.2.0

Apr 2, 2023

0.1.0

Mar 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cappr-0.6.1.tar.gz (502.1 kB view hashes)

Uploaded Oct 13, 2023 Source

Built Distribution

cappr-0.6.1-py3-none-any.whl (61.1 kB view hashes)

Uploaded Oct 13, 2023 Python 3

Hashes for cappr-0.6.1.tar.gz

Hashes for cappr-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`85ad736d66352ace00e0488fb106c26dd93762bbaa6f25264e28e233fc3358a4`
MD5	`8b839b9f105571be2c2677c107f77e5b`
BLAKE2b-256	`c744b15348f9362d17184c1c04914a2a25723270097c4b1a6c36fa35853e385e`

Hashes for cappr-0.6.1-py3-none-any.whl

Hashes for cappr-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc433208c88e001228c3c6efb41cf77205e2ea8315f740006b4f2e817aeec885`
MD5	`52516d8ef5cd309794b3c0dbf0a560ae`
BLAKE2b-256	`61ed2edf04bc3741979892875485a6d640226f7db28632f048e2f3f51afa84b2`