Microlib for the Falcon LLM
Project description
LLM Falcon model
Install with:
pip install llm_falcon_model
Contents
Quick example
import torch
from llm_falcon_model import init_part, load_tokenizer
tokenizer = load_tokenizer()
separated_weights_path = '<PATH TO SEPARATED WEIGHTS>'
model = init_part(
model_name='40b',
start_layer=0,
end_layer=12,
separated_weights_path=separated_weights_path,
device='cuda:0'
)
input_text = "The world chess champion Magnus Carlsen"
input_ids = tokenizer.encode(input_text).ids
batch = torch.tensor(input_ids).unsqueeze(0)
x = model(batch)
# x is now the result after end layer 12, shaped:
# torch.Size([1, 7, 8192])
What is it
This microlib allows you to run a part of a Falcon model as a standalone PyTorch module. This enables you to run in distributed mode, using even old GPUs with less memory.
It only contains code needed for inference.
The only dependencies are torch
, tokenizers
, einops
and llm_weights_mmap
.
The original implementation is available here.
When to use it
Use it when you cannot fit the whole Falcon model into memory. If you have multiple
old GPUs with less memory, you can run different parts of the Falcon model on each of them and when
you make them communicate (using for example llm_partial_run
), you can run the full model on multiple
heterogeneous hosts. For example, if you have 4 old gaming PCs with a 3090 card (~6000$), you can run Falcon 40B
real-time (5-6 tokens/s)
You can also use it when you want to run Falcon on a large number of inputs and have insufficient memory for the model. You can serialize the intermediary results for all inputs and then continue with the next layers
When not to use it
Don't use this library if you want to train or finetune a model, this is just a library for inference.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llm_falcon_model-0.6.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b9dcceb9034410faf288afc52d8c85f727176794288f0b0f9a731586789591b |
|
MD5 | 5ea08f294f7ef2a4145652024315bd07 |
|
BLAKE2b-256 | 8ba77e8dda6a215ee923fd26e8532e5f01863de03b3513059dae3fbf9a3512dd |