Microlib for sampling from an LLM
Project description
LLM Sampler
Install with:
pip install llm_sampler
llm_sampler
allows you to sample from any LLM.
It accepts a forward_func
as a parameter, which could be any Python function, which accepts input_ids
tensor and
outputs logits
tensor.
You can use it with any model from llm_microlibs
and even Huggingface Transformers, mistral, remotely called models.
It also allows you get probability scores for sequences given by the user.
For example, if you supply the input:
Input: The sentiment of the sentence 'I loved it' is
- Option 0:
positive
- Option 1:
negative
This lib will return the probabilities for the options.
In that sense, llm_sampler
can be used as a zero-shot classifier.
Sampling overview
Sample from an LLM with temperature:
import torch
from llm_sampler import sample
# Initializes the forward_func.
# This could be any function that returns logits when given input tokens
# For example, Hugggingface Models, LLaMa, Falcon, etc.
forward_func = load_model()
input_ids = tokenize_input("Magnus Carlsen had won the World ") # Tokenize the input
max_new_tokens = 10 # Number of new tokens to generate
generated_tokens = sample(
forward_func=forward_func,
input_ids=input_ids,
max_new_tokens=max_new_tokens,
temperature=0.6,
warp_top_k=10
)
for next_token in generated_tokens:
print("Next token:", next_token)
Example - Huggingface pipeline
import torch
import transformers
from transformers import AutoTokenizer
from llm_sampler import sample
from tqdm import tqdm
model = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
# device_map="auto",
device=torch.device("cuda")
)
input_text = "Magnus Carlsen had won the World "
input_ids = pipeline.tokenizer(input_text, padding=False, add_special_tokens=False, return_tensors="pt")
input_ids = input_ids.to(torch.device("cuda"))["input_ids"]
generator = sample(
forward_func=lambda x: pipeline.model(input_ids=x).logits,
input_ids=input_ids,
max_new_tokens=2,
temperature=0.001
)
result_tokens = []
for token in tqdm(generator):
int_token = token.cpu().item()
result_tokens.append(int_token)
decoded = pipeline.tokenizer.decode(result_tokens, skip_special_tokens=True)
Example - score batch
Sample from an LLM with multiple choice:
from llm_sampler import score_batch
# Initializes the forward_func.
# This could be any function that returns logits when given input tokens
# For example, Hugggingface Models, LLaMa, Falcon, etc.
forward_func = load_model()
scores = score_batch(
forward_func=forward_func,
input_ids=tokenize_input("The sentiment of the sentence 'I loved it' is '"),
all_continuation_ids=[
tokenize_input("positive sentiment"),
tokenize_input("negative"),
tokenize_input("neutral"),
]
)
# scores is now
# tensor([[-1.0078, -2.5625],
# [ 0.6914, -7.0312],
# [-4.4062, -7.9688]], dtype=torch.bfloat16)
#
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llm_sampler-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd8e03f9ff5decfe6c1b173da650fa5ce70700197ee1d192c0a9eb5f3173adf2 |
|
MD5 | 4fde4f80438b9f466dd34344d131fa68 |
|
BLAKE2b-256 | b5bb46ef16979b4a5fc761a396c9a0908fa9c60fff21ac1cd74f52673ca03740 |