Skip to main content

Microlib for sampling from an LLM

Project description

LLM Sampler

Install with:

pip install llm_sampler

Downloads PyPi version PyPI license

Read the Docs Badge

llm_sampler allows you to sample from any LLM. It accepts a forward_func as a parameter, which could be any Python function, which accepts input_ids tensor and outputs logits tensor.

You can use it with any model from llm_microlibs and even Huggingface Transformers, mistral, remotely called models.

It also allows you get probability scores for sequences given by the user.

For example, if you supply the input: Input: The sentiment of the sentence 'I loved it' is

  • Option 0: positive
  • Option 1: negative

This lib will return the probabilities for the options. In that sense, llm_sampler can be used as a zero-shot classifier.

Sampling overview

Sample from an LLM with temperature:

import torch
from llm_sampler import sample

# Initializes the forward_func. 
# This could be any function that returns logits when given input tokens 
# For example, Hugggingface Models, LLaMa, Falcon, etc.
forward_func = load_model()
input_ids = tokenize_input("Magnus Carlsen had won the World ") # Tokenize the input
max_new_tokens = 10  # Number of new tokens to generate

generated_tokens = sample(
    forward_func=forward_func,
    input_ids=input_ids,
    max_new_tokens=max_new_tokens, 
    temperature=0.6,
    warp_top_k=10
)
for next_token in generated_tokens:
    print("Next token:", next_token)

Example - Huggingface pipeline

import torch
import transformers
from transformers import AutoTokenizer
from llm_sampler import sample
from tqdm import tqdm

model = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # device_map="auto",
    device=torch.device("cuda")
)

input_text = "Magnus Carlsen had won the World "
input_ids = pipeline.tokenizer(input_text, padding=False, add_special_tokens=False, return_tensors="pt")
input_ids = input_ids.to(torch.device("cuda"))["input_ids"]

generator = sample(
    forward_func=lambda x: pipeline.model(input_ids=x).logits,
    input_ids=input_ids,
    max_new_tokens=2,
    temperature=0.001
)
result_tokens = []
for token in tqdm(generator):
    int_token = token.cpu().item()
    result_tokens.append(int_token)
decoded = pipeline.tokenizer.decode(result_tokens, skip_special_tokens=True)

Example - score batch

Sample from an LLM with multiple choice:

from llm_sampler import score_batch

# Initializes the forward_func.
# This could be any function that returns logits when given input tokens
# For example, Hugggingface Models, LLaMa, Falcon, etc.
forward_func = load_model()

scores = score_batch(
    forward_func=forward_func,
    input_ids=tokenize_input("The sentiment of the sentence 'I loved it' is '"),
    all_continuation_ids=[
        tokenize_input("positive sentiment"),
        tokenize_input("negative"),
        tokenize_input("neutral"),
    ]
)

# scores is now 
# tensor([[-1.0078, -2.5625],
#         [ 0.6914, -7.0312],
#         [-4.4062, -7.9688]], dtype=torch.bfloat16)
# 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_sampler-0.2.0.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

llm_sampler-0.2.0-py3-none-any.whl (4.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page